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INTRODUCTION 


Dependable  communications  capabilities  may  well  make  the  critical 
difference  between  success  and  disaster  in  military  crisis  situations,  and 
are  essential  for  effective  operation  of  the  DoD  in  peacetime.  Maintaining 
U.S.  military  communications  assets  in  a  continual  state  of  readiness  is 
therefore  an  area  of  major  concern.  The  term  "System  Control"  embraces  all 
the  management  and  control  functions  that  must  be  performed  to  maintain 
peak  communications  readiness  over  the  lifetime  of  communications 
facilities,  including  such  tasks  as  operating  and  administering  the  various 
networks  in  the  Defense  Communication  System  (DCS),  repairing  and  restoring 
failed  links,  and  responding  to  communications  emergencies.  These 
functions  are  coordinated  by  a  hierarchical  worldwide  System  Control 
organization  headquartered  at  the  Defense  Communication  Agency  in 
Washington. 

The  basic  nature  of  the  System  Control  structure,  little  changed  in 
decades  (except  for  gradual  modernization  of  equipment),  relies  heavily 
upon  the  skills  and  experience  of  the  large  number  of  personnel  manning  the 
control  facilities.  This  leads  inevitably  to  a  number  of  problems: 

(1)  chronic  shortages  of  skilled  personnel;  (2)  high  training  costs; 

(3)  continual  increases  in  equipment  sophistication,  hence  in  personnel 
skill  requirements;  and  (4)  continual  increases  in  status  and  control 
information  flow  from  modern  computerized  communications  equipment,  hence 
in  requirements  for  personnel  to  absorb,  assess  and  respond  to  masses  of 
data. 

The  goal  of  the  Knowledge-Based  Systems  Analysis  and  Control  Program 
at  Lincoln  Laboratory  is  to  develop  solutions  for  these  problems  through 


application  of  Machine  Intelligence  technology.  The  character  of  the 
problems  varies  with  level  in  the  System  Control  hierarchy,  and  with  the 
type  of  facilities  being  cont. oiled:  a  Tech  Control  Facility  (TCF)  handling 
dedicated  circuits  is  very  different  from  a  Defense  Switched  Network  (DSN) 
control  center  handling  voice  traffic  flow  in  a  network  of  computerized 
digital  circuit  switches,  for  example,  and  yet  another  set  of  needs  exists 
at  the  regional  and  higher-level  control  centers.  Lincoln's  program 
currently  includes  an  ongoing  Expert  System  development  aimed  at  Tech 
Control,  described  in  Sections  2  and  3  of  this  report,  and  a  study  of 
future  applications  of  Machine  Intelligence  at  higher  System  Control 
levels,  reported  in  Section  4.  In  addition  to  the  Tech  Control  project 
plans  described  in  Section  5,  the  FY87  program  will  include  a  new  DSN 
control  simulation  and  Expert  System  design  effort  which  will  build  upon 
earlier  Lincoln  work. 

The  time  scale  of  the  Lincoln  program  is  consistent  with  a  1990s-era 
deployment  of  Machine  Intelligence  adjuncts  to  System  Control.  An 
engineering  model  of  the  Tech  Control  expert  system  has  been  under 
development  throughout  FY86,  and  will  be  demonstrated  in  the  field  in  the 
first  half  of  FY87  (as  described  in  Section  5);  completion  is  expected  by 
the  end  of  FY88,  at  which  time  its  features  and  performance  can  be 
incorporated  into  a  specification  for  commercial  procurement.  The  DSN 
Control  Expert  System  development  effort  beginning  in  FY87  will  take  longer 
because  it  embodies  some  challenging  problems,  as  discussed  below,  but  it 
is  nonetheless  reasonable  to  anticipate  commercial  procurement  within  a 
5-10  year  time  frame.  Given  suitable  sponsorship  and  support,  it  is  also 
feasible  in  the  same  time  frame  (in  view  of  current  technology)  to 


implement  other  higher-level  System  Control  concepts  as  discussed  in 
Section  4. 

1.1  Tech  Control  Problem  Definition 

The  purpose  of  this  subsection  is  to  set  the  stage  for  the  detailed 
technical  description  of  the  "Expert  Tech  Controller"  in  Section  2.  We 
first  describe  the  Tech  Control  Facility  (TCF)  problem  domain  and  the  way 
it  is  presently  handled  by  skilled  staff  personnel;  we  show  the  problem's 
appropriateness  as  a  classic  near-term  application  of  Expert  System 
techniques;  and  we  then  describe  the  functions  that  the  objective  Expert 
System  will  perform  for  its  human  operators. 

Tech  Control  deals  with  full-time  dedicated  circuits,  and  effectively 
resides  at  the  foundation  layer  of  the  System  Control  hierarchy.  There  are 
some  61,000  dedicated  circuits  in  the  worldwide  DCS;  many  of  them  furnish 
full-time  connectivity  for  specific  critical  users,  while  others 
(especially  overseas)  provide  transmission  services  for  various  networks  of 
the  DCS.  Each  dedicated  circuit  is  served  by  a  Tech  Control  Facility  at 
each  end,  and  generally  by  one  or  more  additional  TCFs  at  intermediate 
points  en  route.  There  are  about  400  TCFs,  many  of  them  handling  up  to 
1,000  circuits  or  more;  most  are  overseas,  where  the  U.S.  military  tends  to 
own  and  manage  circuits  directly.  In  CONUS,  by  contrast,  most  military 
dedicated  circuits  are  provided  and  serviced  by  commercial  vendors. 

A  Tech  Control  Facility  typically  has  circuits  ranging  from  75-baud 
teletype  links  to  broadband  microwave  carriers.  The  transmission  facilities 
typically  include  a  wide  variety  of  media  installed  over  a  period  of  years, 
from  old  unconditioned  telephone  lines,  to  analog  and  digital  multiplexed 


facilities,  to  modern  fiber  optics,  microwave  and  satellite  links. 
Similarly,  the  equipment  within  a  TCF  ranges  in  age  and  complexity  from 
1950s-era  analog  carrier  gear  to  the  latest  microprocessor-controlled  test 
and  multiplexing  equipment. 

The  primary  duties  of  the  staff  at  a  TCF  (typically  in  three  shifts 
around  the  clock,  totalling  60  or  more  personnel  at  the  busier  sites) 
include:  (1)  rapid  restoral  of  service  whenever  outages  occur,  (2)  routine 
circuit  test  and  maintenance,  and  (3)  planning  and  implementation  of 
circuit  changes  and  additions  pursuant  to  DCA  orders,  in  response  to 
changing  user  requirements.  Category  (1)  is  the  most  critical:  each 
circuit  has  a  "restoral  priority",  depending  on  the  mission  it  serves. 

The  most  important  circuits  have  spare  facilities  that  can  be  activated  in 
the  event  of  failure,  or  can  pre-empt  facilities  in  use  by  circuits  of 
lesser  priority,  while  others  are  "logged  out"  until  repairs  are 
completed.  In  any  case,  the  Tech  Controllers  proceed  as  rapidly  as 
possible  to  do  "fault  isolation"  (i.e.,  to  identify  the  failed  subsystem) 
and  arrange  for  repairs.  During  this  process  they  prepare  DD  Form  1443,  an 
outage  report  detailing  the  history  of  the  problem;  this  report  is 
ultimately  transmitted  up  the  System  Control  hierarchy. 

The  second  category  of  Tech  Control  duties  is  typically  tedious  and 
slow,  involving  repetitive  tests,  measurements  and  record-keeping  on  large 
numbers  of  circuits.  Some  automation  has  been  introduced,  but  human 
supervision  is  required  in  managing  and  interpreting  the  results. 

Typically  these  duties  are  carried  out  by  the  more  junior  staff,  with 
skilled  instruction  and  assistance  as  necessary. 


A  TCF  may  have  several  people  assigned  full-time  to  the  third  category 
of  duties,  since  circuit  configuration  changes  occur  often  enough  that 
there  may  typically  be  several  tens  of  them  in  the  process  of  planning  and 
preparation.  The  process  begins  when  a  user  agency  submits  a 
"Telecommunications  Service  Request"  (TSR)  to  the  DCA  stating  a  requirement 
for  a  circuit  change  or  addition,  and  the  DCA  responds  by  selecting  the 
routing  and  facilities  to  be  installed  or  re-allocated  to  meet  the 
requirement.  These  are  specified  in  a  highly  formatted  "Telecommunications 
Service  Order"  (TSO)  sent  to  all  TCFs  affected  by  the  change;  each  of  the 
latter  carries  out  detailed  planning  and  preparation  of  the  necessary 
in-house  changes,  and  all  parties  activate  the  new  circuit  on  a 
pre -determined  date.  As  part  of  this  process,  each  TCF  prepares  a  DD  Form 
1441  card  for  the  new  circuit,  containing  dozens  of  items  of  information 
about  it,  which  becomes  a  part  of  the  master  card  file  that  constitutes  the 
data  base  of  the  TCF. 

Pervading  all  of  these  TCF  activities  are  manpower  and  training 
problems  of  major  proportions.  TCFs  are  staffed  by  active  duty  military 
personnel,  most  of  whom  are  relatively  young  and  inexperienced:  as  soon  as 
personnel  acquire  telecommunications  skills,  they  are  eagerly  sought  by 
civilian  companies  offering  good  pay  and  stable  jobs.  Besides  lacking 
required  skills  (the  training  schools  are  very  basic),  new  arrivals  at  a 
TCF  face  an  enormous  task  of  learning  the  configuration  and  characteristics 
of  the  circuits  and  equipment  unique  to  that  site.  By  the  time  they 
achieve  reasonable  familiarity,  it  is  likely  that  they  will  be  transferred 
(under  normal  military  rotation  policy)  to  another  duty  station. 


The  result  of  these  pressures  is  that  only  a  fraction  of  the  personnel 
at  a  typical  TCF  have  the  skills  and  knowledge  necessary  to  operate  it. 

For  example,  at  the  2045th  Information  Systems  Group  at  Andrews  Air  Force 
Base  (discussed  in  Section  2  as  the  target  environment  for  the  Expert  Tech 
Controller  development),  there  is  a  complement  of  62  to  65  personnel,  of 
whom  50  are  trainees.  Moreover,  because  of  military  budget  and  manpower 
constraints  it  is  often  true  that  a  TCF  has  fewer  personnel  than  its 
authorized  complement. 

The  Tech  Controller's  expertise  is  complex  and  extensive,  yet 
ultimately  describable  in  terms  of  reasoning  and  inference  drawing  upon 
factual  knowledge.  This  is  precisely  the  domain  of  modern  Expert  System 
development,  in  which  Knowledge  Engineers  capture  the  expertise  of  skilled 
practitioners  in  the  target  environment  through  extended  interaction, 
translating  it  into  a  software  system  capable  of  performance  approaching 
that  of  the  experts.  A  suitable  Expert  System  can  ameliorate  the 
above-described  concerns  in  Tech  Control  in  four  ways:  (1)  guiding  novices 
through  the  solution  of  difficult  problems;  (2)  easing  the  shortage  of 
skilled  manpower,  by  allowing  discretionary  use  of  less-qualified  personnel 
in  higher-level  positions;  (3)  preserving  a  "corporate  memory"  of  circuit 
problems  rare  enough  that  they  may  never  have  been  encountered  by  personnel 
currently  assigned;  and  (4)  easing  the  training  burden  for  senior 
personnel.  Interestingly,  while  the  first  three  of  these  benefits  have 
perhaps  greater  long-term  potential,  the  fourth  aroused  the  most  enthusiasm 
among  senior  Tech  Controllers  consulted  during  the  formative  stages  of  the 
Lincoln  program.  Instead  of  spending  three-fourths  of  their  time 


conducting  training  sessions  with  books,  paper  and  pencil,  senior  NCOs 
could  look  in  occasionally  while  the  Expert  System  guides  trainees  through 
realistic  fault  isolation  and  problem-solving  exercises. 

To  be  more  explicit,  the  function  and  operation  of  the  "Expert  Tech 
Controller"  (referred  to  as  "ETC"  for  brevity)  under  development  at  Lincoln 
Laboratory  will  appear  to  the  users  as  follows.  ETC  is  implemented  in  a 
small  Symbolics  3645  computer,  with  a  high-resolution  video  terminal  and 
keyboard,  which  will  remain  quiescent  in  a  TCF  until  invoked  to  solve  a 
problem.  When  an  outage  occurs  in  one  of  the  circuits  served  by  the  TCF, 
Tech  Controllers  currently  learn  of  it  by  means  of  a  fault  alarm  light  or 
signal  from  the  equipment,  or  (more  typically)  through  a  complaint  called 
in  by  a  user  of  the  circuit.  When  ETC  is  to  be  used  to  diagnose  a  fault, 
it  will  be  invoked  by  a  human  operator  who  enters  the  outage  symptoms  via 
keyboard  and  menu/mouse  facilities.  ETC  begins  the  fault  diagnosis  process 
by  calling  up  all  the  information  in  its  data  base  on  circuits  that  may  be 
relevant  to  the  problem;  this  reflects  the  standard  beginning  step  by  human 
experts,  which  is  to  go  to  the  1441  card  file  and  pull  all  the  cards  that 
may  be  involved.  (In  fact,  ETC's  data  base  is  created  by  effectively 
entering  1441  card  images,  by  means  of  user-friendly  editing  facilities.) 
ETC  then  displays  a  graphic  image  representing  a  diagram  of  the  failed 
circuit  and  related  facilities;  this  is  analogous  ‘to  the  hand-drawn  circuit 
diagrams  currently  found  on  1441  cards.  (The  graphics  are  not  used  by  ETC 
in  the  diagnosis,  but  are  provided  for  the  operator's  convenience  in 
tracing  and  understanding  the  logic.) 


ETC  then  pursues  a  fault  isolation  strategy  reflecting  the  approach 
that  would  be  taken  by  a  skilled  human  technician  in  similiar 
circumstances.  A  dialogue  is  conducted  with  the  operator  via  terminal  and 
keyboard,  requesting  status  and  parameter  information  (such  as  signal 
presence/absence,  level,  quality)  at  particular  points  of  interest  at  the 
current  stage  of  diagnosis.  Finally  ETC  presents  a  conclusion  as  to  the 
faulty  component,  together  with  a  selection  of  appropriate  corrective 
actions  for  the  human  operators  to  choose  from. 

ETC  serves  operator  interest  and  training  objectives  by  highlighting 
the  region  of  the  graphics  display  which  is  the  current  focus  of  attention 
during  a  diagnostic  session  and  by  providing  explanations  (on  request)  of 
the  logic  of  the  question  currently  being  asked.  An  example  of  a  training 
feature  which  is  readily  implementable  in  ETC  (though  not  yet  in  place) 
would  be  a  special  instruction  mode  in  which  a  student  operator  is  required 
to  predict  the  next  question  in  the  dialogue  before  ETC  proceeds,  and 
records  are  kept  of  the  student's  performance.  Another  valuable  training 
feature  inherently  present  in  the  ETC  concept  is  the  ability  to  load  ETCs 
at  a  stateside  service  school  with  the  data  bases  of  overseas  TCFs,  as  a 
means  of  giving  students  a  long  head  start  in  familiarization  with  the 
specific  sites  to  which  they  are  about  to  be  assigned. 

1.2  Summary  of  Activities 

The  Program  began  with  study  and  architecture  definition  efforts  prior 
to  FY86,  as  noted  earlier.  In  October  1985  specific  arrangements  were  made 
with  the  commanding  officer  of  the  2045th  ISG  (at  Andrews  AFB)  to 
participate  with  Lincoln  Laboratory  in  the  development  of  the  Tech  Control 


expert  system,  by  providing  the  necessary  expert  knowledge.  Choices  were 
made  by  Lincoln  as  to  hardware  and  software  environments  for  the  initial 
implementation  (the  "Mark  I  Expert  Tech  Controller”):  the  Symbolics  3640 
computer  with  its  native  ZetaLISP  language,  hosting  an  expert  system 
development  shell  known  as  ART  (for  Automated  Reasoning  Tool),  purchased 
from  Inference  Corp. 

A  series  of  nine  intensive  Knowledge  Engineering  interactions  took 
place  through  the  fiscal  year,  four  at  the  Andrews  Tech  Control  Facility 
and  five  at  Lincoln  Laboratory.  Fault  diagnosis  techniques  were 
successively  described,  implemented  and  refined  in  this  process.  Dates  and 
locations  were: 


3-4  December  1984 

(Andrews ) 

22-23  January  1986 

(Andrews ) 

1-2  April  1986 

(Lincoln) 

1  May  1986 

(Andrews) 

15-16  May  1986 

(Lincoln) 

17-18  June  1986 

(Andrews) 

4-5  August  1986 

(Lincoln) 

10—11  September  1986 

(Lincoln) 

23-23  September  1986 

(Lincoln) 

The  15-16  May  session  was  exceptional,  in  that  it  included  a  critical 
review  of  the  system  philosphy  and  current  capabilities  by  senior 
representatives  from  the  Scott  AFB  headquarters  for  Air  Force  Tech  Control 
operations  worldwide.  The  23-25  September  session  was  concurrent  with  a 
Program  Review  conducted  at  Lincoln  Laboratory  for  the  sponsors,  and 
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included  another  system  evaluation  by  operational  Tech  Control  personnel;  a 
detailed  demonstration  of  ETC  for  the  review  attendees;  and  a  series  of 
technical  presentations  by  Lincoln  engineers  on  the  program,  followed  by  a 
discussion  period.  The  general  tenor  of  the  visitors'  reactions  was 
positive  as  to  the  form,  objectives  and  current  status  of  the  system.  They 
made  technical  comments  and  recommendations  which  are  being  addressed  in 
the  FY87  program. 

Early  in  the  year,  contacts  were  established  with  the  TCJ  SPO  at  the 
Air  Force  Electronic  Systems  Division,  which  is  managing  the  procurement  of 
the  CNCE  (Communications  Nodal  Control  Element),  a  transportable  tactical 
Tech  Control  facility  for  battlefield  applications.  The  CNCE  contains 
electronically-controlled  patching  and  testing  equipment,  and  its  operators 
carry  out  all  their  Tech  Control  functions  by  means  of  software  in  an 
AN/UYK-20  computer.  As  such,  the  CNCE  is  very  well  suited  for  a  future 
Expert  System  implementation  which  could  not  only  guide  the  operator 
through  the  solution  of  problems  but  also  directly  implement  the  chosen 
remedial  actions.  Lincoln  Laboratory  representatives  were  invited  by  the 
TCJ  SPO  to  attend  certain  project  management  meetings  at  the  CNCE 
contractor’s  plant  (Martin  Marietta),  where  contractor  personnel  were 
briefed  on  the  Expert  Tech  Controller  and  consulted  as  to  ideas  and 
mechanisms  for  a  possible  future  Expert  System  for  the  CNCE, 

The  following  section  of  this  report  gives  a  description  of  the 
current  Expert  Tech  Controller  implementation.  Section  3  describes  the 


lessons  learned  in  the  project  to  date  about  expert  system  implementation 


in  a  communications  network  environment.  Section  4  gives  the  results  of  a 
study  carried  out  during  the  year  on  architectures  for  a  simulation -based 
testbed  for  System  Control  techniques.  Section  5  discusses  plans  for  FY87 
work  on  the  project. 

2.  EXPERT  TECH  CONTROLLER  SYSTEM  DESCRIPTION 

The  ETC  system  we  are  developing  is  intended  to  demonstrate  the 
potential  for  machine  intelligence  techniques  in  tech  control.  We  believe 
that  there  are  seven  task  areas  in  which  such  techniques  could  have  value 
in  future  tech  control  facilities.  Table  I  lists  these  task  areas.  The 
order  of  the  tasks  in  the  list  reflects  our  understanding  of  the  relative 
importance  of  the  tasks  to  the  tech  control  mission.  The  primary  task  of  a 
TCF  is  to  restore  service  for  its  communication  users  in  the  event  of  an 
outage  caused  by  equipment  failure,  weather,  hostile  action,  or  whatever. 
However,  it  is  usually  the  case  that  some  fault  isolation  work  must  first 
be  carried  out  before  the  location  for  the  patch  can  be  determined.  Thus 
the  fault  isolation  and  service  restoration  tasks  are  closely  coupled.  The 
third  task,  outage  reporting,  is  also  closely  tied  to  fault  isolation  and 
restoration,  since  TCF s  are  required  to  report  outages  and  the  restoration 
actions  taken  in  a  timely  fashion. 

The  fourth  task,  database  management,  is  an  off-line  task  as  carried 
out  in  current  TCFs.  The  term  "database"  is  used  by  the  tech  controllers 
to  refer  to  a  particular  looseleaf  notebook  of  data  about  the  circuits 
passing  through  the  TCF.  This  data  is  updated  automatically  by  mailings  of 
new  pages  from  DCA  headquarters.  Thus,  the  maintenance  of  that  particular 


TABLE  I 

ETC  SYSTEM  TASKS 

Service  restoration 
Fault  isolation 
Outage  reporting 
Database  management 
Routine  testing 
Training 

New  facility  planning 


notebook  is  not  a  significant  task.  We  use  the  term  "database",  however, 
to  refer  to  the  total  body  of  information  used  and  maintained  by  the  TCF 
staff.  This  consists  of  a  file  of  circuit  data  cards  called  "1441  cards" 
that  are  used  together  with  wall  charts  and  equipment  labels  in  fault 
isolation  and  restoration  work.  In  addition,  a  TCF  has  file  cabinets  full 
of  routine  circuit  test  data,  trouble  histories,  cable  charts,  floor  plan 
layouts,  etc.  that  are  used  as  needed.  In  order  for  the  ETC  system  to 

handle  the  first  three  tasks,  much  of  this  total  database  must  be  made 

available  to  the  computer.  The  realization  of  a  computerized  data  base 
will  have  the  additional  benefit  of  reducing  the  effort  required  for  a  TCF 
to  keep  its  da  abase  up-to-date  and  correct.  Thus,  the  fourth  task, 
database  management  has  a  dual  function  in  the  ETC  system. 

The  fifth  task,  routine  testing,  provides  the  TCF  with  information  on 
the  quality  of  the  service  that  the  circuits  are  providing  to  the  users. 

By  monitoring  circuit  quality  on  a  periodic  basis,  the  controller  can 
detect  deteriorating  circuit  conditions  and  in  some  cases  prevent  any 
outage  by  taking  corrective  action  before  actual  failure  occurs.  The  ETC 
system  should  be  able  to  help  in  this  area  with  the  scheduling  of  tests, 

the  recording  of  data,  and  the  analysis  of  trends  in  the  data  that  could 

indicate  incipient  trouble.  Data  from  routine  tests  can  also  be  used  to 
advantage  in  the  fault  isolation  process  by  providing  measurement  data  for 
comparison  purposes.  Trend  analysis  can  help  in  suggesting  which  of  a  set 
of  otherwise  equally  likely  tests  to  try  next. 

The  last  two  tasks,  training  and  new  facility  planning,  can  be  viewed 
either  as  fringe  benefits  from  the  ETC  system  or  major  goals  in 


themselves.  The  existence  of  the  database  and  the  fault  isolation  and 
service  restoration  capabilities  allow  the  ETC  to  be  used  for  training 
purposes  with  little  additional  development.  An  instructor  and  trainee  can 
use  the  system  to  follow  a  wide  variety  of  trouble  scenarios  without  risk 
to  any  actual  communication  capabilities.  With  augmentation,  ETC  could 
simulate  faults  for  the  trainee  to  diagnose  and  could  record  his  handling 
of  problems  for  later  analysis  by  an  instructor. 

The  database  by  itself  can  offer  some  help  with  new  facility  planning 
by  providing  the  controller  with  information  about  the  availability  of 
resources  such  as  test  jacks,  rack  space,  cables,  etc.  in  the  TCF.  With 
augmentation,  ETC  could  be  programmed  to  work  out  a  detailed  plan  for  the 
installation  of  new  equipment  or  reorganization  of  existing  equipment, 
picking  jack  locations,  making  wiring  schedules,  etc.  As  installation 
progressed,  the  database  would  automatically  be  kept  up-to-date. 

During  FY86  we  have  initiated  implementation  work  in  the  first  four 
areas  with  the  greatest  effort  going  into  fault  isolation  because  that  area 
is  the  one  with  the  most  potential  for  machine  intelligence  techniques.  We 
have  done  nothing  yet  to  support  routine  testing,  to  enhance  the  system's 
potential  as  a  training  aid,  or  to  help  with  new  facility  planning.  In 
FY87  we  expect  to  start  work  on  routine  testing  support  and  to  plan  for 
some  training  enhancements,  but  we  do  not  expect  to  undertake  any  activity 
in  the  new  facility  planning  task  area  in  the  next  year. 

In  future  tech  control  centers  we  anticipate  that  direct  connections 
will  exist  between  the  communication  equipment  in  the  center  and  the 
computer  supporting  the  intelligent  control  functions.  With  such 


connections  the  control  computer  will  have  direct  access  to  fault 
indications  and  will  be  able  to  initiate  tests  and  circuit  patches  without 
human  intervention.  Human  involvement  in  such  a  center  would  be  primarily 
supervisory  in  nature,  and  the  total  number  of  people  needed  at  a  center 
would  be  significantly  reduced  relative  to  the  current  situation  where  many 
people  are  needed  to  operate  a  large  facility  such  as  the  one  at  Andrews 
AFB. 

The  programming  environment  being  used  for  the  development  of  ETC  runs 
on  a  Symbolics  3645,  a  sophisticated  Artificial  Intelligence  workstation. 
This  environment  provides  the  machine's  native  programming  language,  LISP, 
its  object-oriented  programming  tool,  Flavors,  and  its  many  other  built-in 
system  development,  user  interface,  and  debugging  aids. 

We  have  augmented  this  environment  with  a  commercial  state-of-  the  art 
expert  system  building  tool  called  ART  that  is  a  product  of  Inference 
Corporation.  ART  provides  rule-based  programming  facilities  which  include 
backward  and  forward  chaining  inference  mechanisms,  a  frame-based  knowledge 
representation,  hypothetical  reasoning,  pattern  matching,  and  a  number  of 
its  own  program  development,  user  interface,  and  debugging  tools. 

Our  experience  to  date  suggests  that  this  environment  will  provide  the 
functionality  needed  to  handle  the  complexity  of  the  ETC  problem  domain. 

In  particular  we  be lieve  that  the  frame-based  knowledge  representation 
coupled  with  the  procedural  and  rule  based  processing,  is  powerful  enough 
to  represent  the  wide  variety  of  information  necessary.  Currently  this 
information  includes  circuit  topologies,  physical  locations  of  devices, 
measurements  (currently  obtained  by  the  user  and  eventually  to  be  obtained 
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automatically),  rules  and  heuristics  from  experts,  book  and  manual 
diagnostic  procedures,  and  the  structures  necessary  for  the  graphical  user 
interface . 

2.1  Fault  Diagnosis  and  Circuit  Restoration 

Fault  diagnosis  is  the  process  of  localizing  the  failed  element  in  a 
malfunctioning  circuit.  The  Tech  Controller's  primary  goal  is  to  restore 
service  to  the  end  users  of  the  circuit,  within  a  time  interval  of  a  few 
minutes  or  longer,  depending  on  the  "Restoration  Priority"  of  the  circuit. 
Typically  he  will  pursue  fault  diagnosis  to  narrow  the  problem  area  just 
enough  to  identify  the  right  backup  facilities  or  spare  equipment  to  switch 
in,  to  restore  service  on  the  circuit.  After  service  has  been  restored,  a 
finer-grained  diagnosis  we  call  "post-restoration  fault  isolation"  can  be 
used  to  identify  the  cause  of  the  problem,  whereupon  a  repair  call  can  be 
initiated.  This  section  of  the  report  will  describe  in  detail  the 
processes  of  fault  diagnosis,  circuit  restoration,  and  post-restoration 
fault  isolation. 

2.1.1  Problem  Dimensions 

Fault  diagnosis,  circuit  restoration,  and  post-restoration  fault 
isolation  are  affected  by  several  factors.  Many  of  these  factors  interact, 
creating  a  very  complex  problem  in  terms  of  both  processing  strategies  and 
database  organization  necessary  for  successful  implementation. 

We  would  like  ETC  to  be  able  to  isolate  the  causes  of  the  various 
kinds  of  problems  (no  signal,  excessive  retransmissions,  etc.)  that  can 
occur  on  many  types  of  circuits  (voice,  digital,  etc.)  carried  by  a  wide 
variety  of  types  of  links  between  Tech  Control  Facilities  (e.g., 


satellite 


channels,  landlines,  HF  radio).  This  broad  problem  domain  involves  a 
complex  set  of  devices  that  must  be  represented  in  ETC's  detailed  knowledge 
base.  Compounding  these  difficulties  is  the  fact  that  budget  limitations 
over  the  years  have  resulted  in  TCFs  having  a  wide  range  of  device  models, 
manufacturers,  vintages,  and  levels  of  sophistication. 

Another  complex  aspect  of  the  problem  domain  is  coordination  between 
diagnosis  and  circuit  restoration.  The  system  is  generally  capable  of 
finer-grained  fault  isolation  than  may  be  necessary,  or  even  desirable, 
prior  to  performing  circuit  restoration.  ETC  must  be  able  to  recognize  the 
earliest  point  at  which  appropriate  spare  facilities  are  available  for 
restoration  of  service  on  the  circuit.  This  point  varies  with  not  only  the 
inventory  of  spare  devices  but  also  the  connectivity  of  trunks  between  Tech 
Control  Facilities  and  the  relative  ease  of  the  substitution  process  for 
particular  devices. 

Further  difficulties  result  from  the  fact  that  circuits  typically  pass 
through  several  Tech  Control  Facilities  between  the  end  users.  A 
distributed  problen  solving  mechanism  is  therefore  needed,  providing  for 
communication  with  either  expert  systems  or  humans  at  other  TCFs.  To  assure 
effective  interaction,  protocols  for  this  process  must  be  established  and 
enforced . 

This  multiplicity  of  dimensions  creates  a  very  complex  problem  space 
requiring  sophisticated  processing  mechanisms  and  database  management 
techniques.  The  architecture  described  here  appears  to  meet  these 


2.1.2  Fault  Diagnosis  System  Architecture 
A  high  level  of  expertise  is  necessary  to  perform  effective  fault 
diagnosis  on  a  network  of  this  magnitude.  Based  on  many  knowledge 
engineering  sessions,  we  quickly  concluded  that  a  skilled  tech  controller 
is  well-informed  about  a  number  of  strategies  for  fault  diagnosis  and  about 
the  rationale  for  selecting  among  them.  Reflecting  this  general  approach 
used  by  human  experts,  we  have  developed  a  modular,  strategy-oriented 
Expert  System  to  perform  the  fault  diagnosis  functions.  This  system 
integrates  the  efficient  procedural  facilities  of  LISP  and  the  flexible 
rule-based  facilities  provided  by  ART, 

Figure  1  shows  the  modular  organization  of  the  control  mechanism  for 
the  fault  isolation  mechanisms  of  the  expert  system.  The  top  two  boxes 
represent  a  sequential  series  of  information-gathering  steps  done  at  the 
beginning  of  each  diagnosis.  The  third  box  represents  the  process  carried 
out  when  necessary  to  convert  the  database  representation  of  the  circuit  in 
question  to  a  form  more  easily  manipulated  by  the  rest  of  the  system.  The 
fourth  box,  considering  all  the  circuit  and  complaint  information  collected 
thus  far,  determines  which  fault  isolation  strategies  in  the  repertoire  of 
the  system  may  be  applicable  to  the  current  problem.  This  list  of 
applicable  strategies  is  passed  to  the  strategy  control  module.  The 
strategy  control  function  initiates  each  of  these  candidates  in  turn  until 
one  of  them  isolates  the  cause  of  the  problem  to  the  level  of  a  subsystem 
that  can  be  bypassed  by  other  equipment;  at  this  point  the  strategy 
accesses  the  Circuit  Restoration  facilities  to  be  described  below. 

The  large  box  in  the  bottom  center  of  Fig.  1  represents  the  full 
repertoire  of  fault  isolation  strategies  available  to  the  system.  A  new 
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Figure  1.  Fault  isolation  control. 


strategy  can  be  added  by  simply  writing  the  necessary  code  and  rules,  then 
informing  the  Applicable  Strategy  List  generator  as  to  conditions  under 
which  the  new  strategy  may  be  applicable.  The  Investigate  Device  module  at 
the  lower  right  in  Fig.  1  contains  diagnostic  expertise  on  the  individual 
communications  devices  in  the  knowledge  base  of  the  system.  Note  that  all 
fault  diagnosis  strategies  have  access  to  this  expertise. 

2.1.3  Level  of  Expertise 

The  sophistication  of  the  current  fault  diagnosis  and  circuit 
restoration  system  can  best  be  described  by  stating  the  extent  of  four 
factors:  the  types  of  circuits  and  types  of  complaints  the  system  can 

handle;  the  communications  devices  that  have  been  implemented;  the  fault 
diagnosis  strategies  that  are  in  the  system's  repertoire;  and  the  types  of 
circuit  restoration  the  system  can  currently  perform.  From  these  factors 
one  can  infer  the  types  of  problems  that  can  be  diagnosed  and  the  types  of 
circuits  on  which  the  system  can  do  fault  isolation. 

2. 1.3.1  Circuit  and  Complaint  Types 

Our  long-range  goal  is  the  capability  to  do  fault  isolation  on  all 
circuit  types  of  interest  to  TCFs.  Currently  the  system  is  limited  to 
digital  circuits;  in  the  near  future  we  plan  to  add  the  capability  to 
isolate  faults  on  analog  voice  circuits.  Many  circuits  of  both  types  pass 
through  the  Andrews  TCF ,  and  will  be  used  to  refine  the  fault  isolation 
procedures  being  developed. 

Another  limit  of  the  system  at  present  is  that  it  can  handle  only  one 
type  of  complaint,  namely  the  "no  signal"  case  in  which  the  signal  has  been 
lost  completely  somewhere  in  the  path  of  the  circuit.  Development  is 
currently  underway  on  algorithms  to  deal  with  other  types  of  complaints. 
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2. 1.3. 2  Device  Implementation 

Fault  diagnosis  software  has  been  developed  for  a  number  of  devices 
and  circuit  types.  The  work  focused  initially  on  circuit  F142,  a  weather 
data  reporting  circuit  passing  through  the  Andrews  TCF  which  links 
computers  at  Croughton,  England  and  F142  is  a  high-speed  synchronous 
circuit  using  three  major  devices  at  Andrews,  namely: 

(1)  AT&T-2096  -  Modem  /  Multiplexer 

(2)  CITAG  -  Alarm  Group  /  Buffer 

(3)  HSTDM  -  High  Speed  Modem  /  Multiplexer 

The  term  "implementation  of  a  device"  means  providing  the  Expert 
System  with  the  ability  to  diagnose  faults  caused  by  malfunction  of  the 
device.  For  each  device  that  has  been  implemented  in  its  repertoire,  the 
system  can  conduct  a  mouse-oriented,  interactive  session  in  which  the  user 
is  guided  step-by-step  through  diagnostic  procedures  on  the  device.  The 
user's  answers  are  analyzed  by  the  system  to  guide  further  questioning. 
Eventually  the  system  will  determine  whether  the  device  in  question  is 
faulty,  and  so  inform  the  user.  These  interactions  attempt  to  use 
"natural"  diagnostic  strategies  reflecting  a  combination  of  the  suggestions 
of  skilled  human  tech  controllers  and  the  technical  manuals  for  the 
devices.  We  have  observed  that  the  human  experts  tend  to  stop  the 
diagnosis  process  as  soon  as  they  know  what  equipment  substitutions  to  make 
in  order  to  restore  service,  while  blindly  following  the  manuals  would  lead 
far  beyond  that  point. 

Three  other  devices  were  then  implemented,  namely: 

(1)  OMNI -MUX- 1 60  -  Multiplexer 

(2)  LSTDM  -  Multiplexer 

(3)  VFCT  -  Modem  /  Multiplexer 
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We  have  also  added  the  capability  to  isolate  failures  that  occur  in 
the  transmission  medium  carrying  the  signal  between  TCFs.  The  normal 
outcome  of  this  process  is  to  alternate-route  the  signal  through  another 
path  or  medium,  and  to  call  the  appropriate  service  contractor  to  repair 
the  failure. 

The  combination  of  these  six  devices  and  the  ability  to  isolate  faults 
between  TCFs  has  allowed  a  number  of  circuits  to  be  installed  in  the 
knowledge  base  of  the  Expert  System.  These  circuits  include  high-speed 
synchronous  circuits  such  as  F142,  as  well  as  low-speed  teletype  lines. 

This  set  of  installed  circuits  and  devices  amounts  to  a  very  complete 
environment  for  development  and  testing  of  rules  and  procedures  for  fault 
diagnosis,  circuit  restoration,  and  graphics.  This  test  environment  is 
currently  being  used  to  develop  and  refine  a  system  to  handle  to  the 
complaints  and  circuit  types  described  above,  and  encompasses  a  substantial 
fraction  of  the  circuits  that  pass  through  the  Andrews  TCF. 

2. 1.3. 3  Fault  Isolation  Strategies 

Discussions  with  the  Tech  Controllers  at  Andrews  have  led  to 
identification  and  implementation  of  three  fault  isolation  strategies 
relevant  to  high-speed  synchronous  circuits  and  low-speed  TTY  lines.  The 
first  of  these  is  an  alarm-guided  mechanism.  Through  interaction  with  the 
user  (or,  in  the  future,  by  polling  a  set  of  alarm  repeater  lines)  the 
system  learns  the  current  state  of  all  indicator  lamps  and  alarms  on  the 
devices  in  the  circuit.  These  alarms  are  then  analyzed  by  the  system, 
based  on  its  knowledge  of  the  nature  and  causes  of  alarms,  to  guide  the 
fault  isolation  process. 


If  there  are  no  alarm  indications,  or  if  the  first  strategy  is 
unsuccessful  in  isolating  the  cause  of  the  fault,  two  other  strategies  may 
be  used.  Both  implement  a  signal-tracing  mechanism,  one  for  high-speed 
synchronous  circuits  and  one  for  low-speed  TTY  lines.  This  kind  of 
strategy  seems  to  be  the  one  chosen  most  often  by  tech  controllers.  It 
makes  use  of  test  points  where  the  signal  can  be  observed  without 
disrupting  service,  to  identify  the  device  or  devices  that  may  be  causing 
the  problem.  Once  a  device  is  identified  as  a  potential  problem,  more 
extensive  device-specific  questioning  can  proceed  to  determine  whether  it 
is  actually  at  fault. 

It  is  expected  that  additional  strategies  will  be  implemented 
throughout  the  development  of  the  system.  For  example,  a  fourth  strategy 
(for  locating  problems  on  trunks)  and  a  fifth  (to  handle  excessive- 
retransmission  problems)  are  currently  under  development. 

2 . 1 . 3 . 4  Circuit  Restoration 

The  tech  controller's  top  priority  when  working  with  a  circuit  outage 
is  to  restore  service  to  the  users  of  the  circuit  as  quickly  as  possible. 
This  could  include  repairing  or  adjusting  devices  on  the  current  path  of 
the  circuit,  or  rerouting  part  or  all  of  the  circuit  path  through  alternate 
channels,  devices,  trunks,  or  tech  control  facilities.  It  is  important 
that  the  Expert  System  place  a  similarly  high  priority  on  circuit 
restoration. 

Outages  that  require  alternate  routing  create  a  particularly 
interesting  type  of  problem.  We  have  been  creating  knowledge 
representations  and  alternate  routing  strategies  that  will  guide  the  tech 
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controller  to  use  the  most  effective  alternate  routes  and  restore  service 


as  quickly  as  possible.  Some  circuits  use  devices  like  2096s  and  HSTDMs 


which  often  have  full-time  spare  facilities  available  at  the  flick  of  a 


switch;  rerouting  these  circuits  can  be  a  straightforward  problem. 


Rerouting  circuits  that  use  devices  such  as  the  VFCTs  or  LSTDMs  can  be  more 


complex.  For  example,  the  Andrews  TCF  has  32  VFCTs  with  a  number  of 


spares.  A  circuit  outage  caused  by  a  bad  channel  on  one  VFCT  involves 


locating  the  best  way  to  reroute  the  single  effected  circuit.  The 


preferred  solution  according  to  current  practice  is  the  first  of  the 


following  three  procedures  that  is  possible: 


(1)  Reroute  the  circuit  on  a  working  spare  channel  on  the  same  VFCT. 


(2)  Reroute  the  circuit  through  another  VFCT  with  the  same 


destination. 


(3)  Find  a  solution  in  a  card  file  built  up  over  the  years  on  how  to 


get  certain  important  circuits  from  one  place  to  another. 


Along  with  being  able  to  restore  service  by  substitution  of  complete 


devices,  trunks  or  channels,  as  described  above,  the  Expert  System  can 


suggest  other  possible  routes  selected  from  the  equivalent  of  the  card 


file.  The  system  modifies  the  graphics  displays  to  show  alternate  routing 


mechanisms  as  they  are  implemented.  Details  of  device  and  trunk  patches  in 


use,  as  well  as  spare  devices  temporarily  in  use  for  test  purposes,  are 


shown  on  the  displays.  This  helps  the  user  to  better  visualize  the 


dynamics  of  the  circuit  configuration  during  and  after  fault  diagnosis. 


2.1.4  Demonstration  Scenario 


The  best  way  to  understand  exactly  what  the  Expert  System  does  is  to 


actually  see  it  work.  Since  that  is  impossible  in  a  report,  a  realistic 
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scenario  will  be  described  using  copies  of  a  number  of  terminal  screen 
displays.  The  purpose  of  this  section  is  to  give  the  reader  a  general  feel 
for  the  problem-solving  process,  along  with  an  introduction  to  the  user 
interface  and  graphics  capabilities  of  the  system. 

The  scenario  involves  a  circuit  between  Bermuda  and  Carswell,  Texas 
which  is  designated  7DUM.  Each  interaction  between  the  user  and  the  Expert 
System  begins  with  a  question  to  determine  what  the  user  wants  to  do 
(Fig.  2).  In  this  case  we  selected  "Diagnose  Fault".  It  should  be  noted 
that  nearly  all  the  questions  asked  of  the  user  are  mouse-sensitive  menus 
or  items;  this  makes  the  system  extremely  easy  to  learn  and  use. 

The  next  question  (Fig.  3)  asks  for  the  type  of  complaint  that 
initiated  the  fault  diagnosis  process.  In  this  case  we  are  assuming  that  a 
user  complaint  was  received.  Another  interaction  elicits  the  designation 
of  the  circuit  to  be  diagnosed  (7DUM),  and  the  system  consults  its  database 
and  produces  three  important  graphic  displays.  The  first  (Fig.  4)  is  an 
exact  replica  of  the  DD  Form  1441  for  this  circuit  that  is  currently  in  the 
physical  card  file  at  the  Andrews  TCF;  this  is  the  source  that  the  tech 
controllers  now  use  to  obtain  most  of  the  circuit  information.  The  other 
two  graphic  displays,  shown  in  Fig.  3(a),  are  an  overall  circuit  diagram  of 
the  source,  destination  and  intermediate  facilities  (bottom)  and  a  detailed 
diagram  of  the  devices  in  the  circuit  path  within  the  Andrews  TCF  (top). 
Close  examination  of  the  upper  graphic  in  Fig.  5(a)  shows  the  system 
representation  for  three  different  mul t iplexers .  This  scenario  assumes 
that  the  fault  is  in  the  VFCT  on  the  right.  The  small  numbers  to  the  left 
of  this  box  indicate  all  the  circuits  using  it,  and  sharing  trunk  6J04 
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main  menu 


Figure  4.  1441  display 


exiting  toward  the  Norfolk  TCF  on  the  right.  Multiplexer  inputs  labeled 
with  a  dashed  line  instead  of  a  number  indicate  spare  channels,  as  shown  on 
the  ATT-2096  and  the  OMNI-MUX-160  in  the  display.  Because  of  the  larger 
size  and  high  resolution  of  the  actual  Symbolics  terminal  screen,  these 
features  are  more  easily  interpreted  than  the  screen  images  shown  here 
would  indicate.  These  circuit  diagrams  are  generated  dynamically  from  the 
database,  and  contain  much  more  information  than  tech  controllers  currently 
show  in  hand-drawn  circuit  diagrams  on  the  backs  of  1441  cards. 

2. 1.4.1  The  Fault 

The  fault  that  is  assumed  for  the  purposes  of  this  sample  scenario  is 
a  bad  receive  channel  on  the  VFCT  in  Fig.  5(a).  Figure  5(b)  shows  a 
replica  of  this  diagram  that  has  been  hand  marked  to  show  all  the 
consequences  of  this  fault  that  could  be  observed  by  the  tech  controller. 

In  order  to  be  able  to  give  logical  and  consistent  replies  to  the  Expert 
System  in  the  course  of  the  diagnosis,  this  information  has  to  be  worked 
out  in  advance,  as  follows.  Carswell  would  be  receiving  a  constant  mark 
instead  of  the  expected  continual  keying.  The  only  device  having  alarm 
indicators  on  this  circuit  is  the  ATT-2096.  Since  the  signal  for  7DUM  at 
the  ATT-2096  is  riding  trunk  6G0H,  which  also  carries  traffic  for  other 
circuits,  the  loss  of  just  7DUM  will  not  cause  any  alarm  conditions  on  the 
device.  Because  of  the  direction  and  nature  of  the  assumed  fault,  signals 
are  not  present  at  jacks  FPI-1 133-1-22  and  FPI-1230-3-9.  Since  the  problem 
is  with  only  one  receive  channel  of  the  VFCT  it  can  be  assumed  that  a 
signal  is  being  sent  from  NFK-BFC  (the  Norfolk  TCF)  and  that  all  other 
outputs  from  the  VFCT  are  present. 


COMPLAINT  RECEIVING  MARK  FAULT  BAD 

(No  Keying)  RECEIVE  CHANNEL 


Figure  5(b).  Symptoms  of  assumed  fault 


Figures  6  and  7  show  the  system  being  informed  as  to  the  type  of 
complaint  (Receiving-Mark)  and  the  source  of  the  complaint  (Carswell). 
Figure  8  shows  the  system  displaying  a  mouse-sensitive  image  of  the  alarm 
panel  of  an  ATT-2096.  The  darkened  items  are  those  that  are  "on’’  in  the 
normal  state,  and  in  this  case  the  user  need  only  mouse  EXIT  to  indicate 
that  no  alarms  are  active.  Mouse-sensitive  alarm  panel  images  are  used  by 
the  Expert  System  for  all  devices  having  alarms.  Care  has  been  taken  to 
graphically  recreate  the  exact  appearance  of  the  panel  of  each  device;  this 
is  intended  to  make  the  system  easier  to  learn  and  use. 

2. 1.4. 2  Signal-Tracing  Strategy 

Having  found  no  useful  alarm  information,  the  system  initiates  a 
signal-tracing  strategy  to  isolate  the  faulty  component.  Since  the  problem 
exists  in  the  signal  being  sent  from  NFK-BFC  to  CAR-TCK,  the  first  place  to 
check  whether  a  signal  is  present  is  at  jack  FP 1—1 133—1 —22  (Fig.  9),  the 
upstream-most  digital  jack  that  carries  only  7DUM.  Note  the  dashed  box 
around  the  jack;  this  box  indicates  the  current  focus  of  attention  of  the 
system,  and  always  identifies  the  device  that  the  user  would  have  to  find 
in  order  to  make  observations  or  run  tests  currently  needed  in  the 
diagnosis  process.  In  this  case  the  user  is  mousing  the  word  NO  in  the 
window  at  the  top  center  of  the  screen.  Figure  9  also  shows  the  status 
(RECEIVING-MARK)  under  the  Carswell  TCF.  The  status  line  is  continually 
updated  for  all  components  of  the  circuit  during  the  diagnosis. 

Since  there  is  no  signal  at  jack  FPI-1 133-1-22,  the  system,  looks 
upstream  and  considers  the  VFCT.  Figure  10  shows  the  system  asking  if 
there  are  any  good  signals  coming  from  the  VFCT.  This  will  help  determine 
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System  Com  man  <Js 


Figure  9.  Signal  tracing  query 


Figure  10.  VKCT  good/bad  query. 


whether  there  is  a  problem  with  the  entire  trunk,  or  just  the  channel 
carrying  7DUM  on  the  trunk.  Since  the  user  is  mousing  the  reply  YES  the 
system  must  find  out  whether  the  upstream  tech  control  facility,  NFK-BFC, 
is  sending  a  signal  on  7DUM  (Fig.  11).  At  this  point  the  user  would  call 
up  NFK-BFC  and  ask  them  to  verify  that  they  are  sending  the  signal,  and  the 
user  would  mouse  the  reply  YES  as  shown.  The  system  therefore  concludes 
that  only  the  channel  carrying  7DUM  on  the  trunk  is  faulty.  This  could  be 
caused  by  a  bad  VFCT  component  at  either  Andrews  or  Norfolk. 

2. 1.4. 3  Circuit  Restoration 

The  signal-tracing  strategy  has  isolated  the  fault  to  channel  7DUM 
riding  on  trunk  6J04.  At  this  point  the  system  recognizes  that  service  can 
be  restored  immediately  on  7DUM  by  substitution  of  good  equipment,  leaving 
the  actual  identification  of  the  failed  board  or  component  to  be  completed 
at  leisure.  The  easiest  restoration  procedure  would  be  to  locate  a  spare 
channel  on  the  VFCT  on  trunk  6J04  and  (coordinating  with  a  tech  controller 
at  Norfolk)  switch  7 DUM  over  to  it.  The  system  discovers,  however,  that 
there  are  no  spare  channels  on  6J04;  all  the  inputs  on  the  left  of  the  VFCT 
have  circuit  designators  filled  in.  Fortunately  the  system  knows  that 
Andrews  operates  several  VFCT  trunks  to  Norfolk,  and  is  able  to  find  a 
spare  channel  on  one  of  them,  namely  trunk  6H34.  The  patch  is  coordinated 
with  NFK-BFC,  and  is  shown  graphically  in  Fig.  12.  This  display  uses 
curved  lines  to  indicate  patch  cords,  clearly  indicating  the  patches 
necessary  to  implement  the  switch  of  7DUM  from  the  faulty  channel  on  6J04 
to  a  new  channel  on  6H34. 

After  the  restoration  is  completed  the  system  produces  the  DD  Form 
1443  TROUBLE  AND  RESTORATION  RECORD  (Fig.  13)  that  is  normally  filled  out 


ream  TCF 


by  the  tech  controller  after  each  outage.  This  one  is  shown  only  partially 
filled  out.  Work  is  currently  in  progress  to  allow  the  system  to  complete 
the  form. 

2. 1.4. 4  Post-Restoration  Fault  Isolation 
Although  at  this  point  service  has  been  restored  on  7DUM,  the  system 
has  not  yet  identified  the  component  that  must  be  repaired.  At  his  first 
opportunity,  a  tech  controller  working  on  this  problem  would  perform  a 
finer-grained  fault  isolation  so  that  the  appropriate  repair  service 
personnel  can  be  dispatched.  For  example,  the  symptoms  in  the  scenario 
described  above  could  have  been  caused  by  a  fault  in  either  the  VFCT  at 
Andrews  or  the  VFCT  at  the  other  end  of  the  trunk  at  Norfolk.  One  menu 
item  above  the  1443  card  (Fig.  13),  REGIN-FURTHF.R-I R0LAT10N,  initiates  this 
process.  It  is  not  illustrated  in  the  present  scenario  because  this 
section  of  the  system  is  currently  under  development. 

2.1.5  Fault  Diagnosis  and  Circuit  Restoration  Summary 
The  goal  of  this  section  has  been  to  familiarize  the  reader  with  the 
complexity  and  the  issues  involved  in  performing  fault  diagnosis  and 
circuit  restoration  on  a  network  of  this  magnitude.  Discussion  of  the 
problem  dimensions,  programming  environment,  and  system  architecture 
were  intended  to  illustrate  the  nature  of  the  problem  and  how  it  is  being 
attacked.  The  level  of  expertise  now  incorporated  was  described  in  terms 
of  the  current  extent  of  three  system  characteristics:  devices  implemented, 
fault  isolation  strategies  implemented,  and  the  methods  of  circuit 
restoration  now  available.  A  detailed  demonstration  scenario  was  presented 
to  give  the  reader  both  a  general  feel  for  the  problem  solving  process,  and 
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an  introduction  to  the  user  interface  and  graphics  capabilities  of  the 
system. 

2 . 2  Data  Entry  and  Management 

One  of  the  prerequisites  for  running  ETC  is  the  existence  of  a 
database  of  circuits  and  devices.  The  correct  and  efficient  creation  of 


such  a  database  is  a  significant  project  in  its  own  right. 


The  database  contains  information  from  a  variety  of  sources.  The 


front  of  the  DD  Form  1441  "Circuit  Data"  card  in  current  use  contains  basic 


information  that  must  be  placed  in  ETC's  database  for  each  circuit.  The 


back  of  the  1441  card  typically  has  a  hand-drawn  graphic  layout  of  the 


circuit  that  may  provide  additional  useful  information  for  the  database, 


Finally,  on-site  observations  and  various  other  sources  may  provide 


supplementary  data  that  needs  to  be  incorporated  in  the  database.  For  our 


purposes  here  we  will  assume  that  the  information  has  been  located;  it 


remains  for  us  to  enter  it  into  the  database. 


In  the  early  days  of  the  ETC  project  it  was  expedient  to  build  the 


database  manually  by  invoking  the  Symbolics  system  editor  and  manipulating 


the  various  files  that  contain  circuit  layout  and  device  information.  Such 


a  procedure  is  cumbersome  and  error-prone,  and  does  not  provide  any 


validity-checking  of  the  information  that  is  entered.  Operation  of  the 


editor  is  complex  enough  that  it  is  not  feasible  for  military  operations 


personnel.  Furthermore,  the  inter-relationships  of  the  various  pieces  of 


information,  e.g.,  the  circuit  layout  and  the  devices  involved,  must  be 


determined  entirely  by  the  person  entering  the  information.  Clearly, 


system  is  needed  for  convenient  and  error-free  management  of  the  database. 


Such  a  software  system  could  serve  many  purposes.  It  could  satisfy 
our  short-term  needs  to  enter  additional  circuits  in  our  database.  In  the 
field  it  could  enable  one  to  transfer  the  contents  of  a  site's  1441  card 
file  and  related  information  into  the  database  and  thereby  be  able  to  test 
ETC  under  realistic  conditions.  On  a  long-term  basis  it  could  enable  one 
to  generate  and  maintain  a  site's  file  of  1441  cards  and  related 
information  completely  in  the  computer. 

These  motivations  led  us  to  begin  work  on  CADET  (acronym  for  Circuit 
And  Device  Entry  Tool).  We  began  working  on  CADET  toward  the  end  of  FY8b 
when  it  became  obvious  to  us  (and  to  personnel  from  Scott  AFB  who  were 
attending  a  demo  of  ETC)  that  the  manual  editing  approach  was  inadequate. 

We  have  produced  a  first  version  of  CADET  which  handles  the  entry  of  a  1441 
card  and  the  circuit  involved,  provided  that  the  underlying  trunk  circuit 
and  devices  have  previously  been  entered.  Further  work  is  being  pursued  to 
extend  the  scope  of  CADET. 

2.2.1  Design  Coals 

Our  design  goals  for  CADET  include  the  following: 

1.  Familiar  representation:  Since  the  144]  card  is  the  medium  used  by 
the  Tech  Controller  for  information  about  a  circuit,  and  since  we  wanted  to 
provide  a  means  for  entering  such  cards  into  the  database,  we  felt  that  the 
representation  that  CADET's  user  sees  on  the  terminal  display  should  be  a 
replica  of  a  1441  card.  This  replica  could  be  the  framework  in  which 
information  is  solicited  from  the  user. 

2.  Interactive:  CADET  should  be  interactive,  providing  immediate 
feedback  to  the  user.  Such  feedback  could  be  a  request  for  more 
information  about  a  datum  that  was  just  entered,  an  error  complaint  about 


it,  or  a  request  for  the  next  datum.  These  responses  should  occur  while 
the  user's  actions  are  still  fresh  in  his  mind. 

3.  Smart  typewriter:  CADET  can  be  a  smart  typewriter  by  providing 
automatic  positioning  to  each  field  of  the  1441  card  in  turn,  forcing  the 
user  to  provide  the  contents  of  a  field  that  must  be  filled  in  before  going 
on  to  the  next  field,  preventing  the  overlapping  of  fields,  complaining 
about  errors  in  fields,  etc. 

4.  Minimize  typing:  As  a  smart  typewriter,  CADET  should  supply, 
wherever  possible,  a  "pre-typed"  menu  of  permissible  responses  to  a 
request.  Such  a  menu  provides  at  a  glance  the  set  of  permissible  choices, 
thereby  clarifying  for  the  user  exactly  what  is  expected.  Furthermore, 
choosing  from  a  menu  avoids  any  possibility  of  typing  errors. 

5.  Computer  experience  not  needed:  CADET  should  provide  facilities 
directly  related  to  the  function  of  generating  and  modifying  1441  cards 
without  demanding  that  the  user  have  experience  in  programming  and  using 
computers.  In  particular,  the  user  should  not  be  required  to  learn 
computer  editors,  languages,  and  translators  and  should  be  shielded  from 
the  operating  system  as  much  as  possible. 

6.  Error  and  validity  checking:  CADET  should  check  the  data  provided 
by  the  user  for  correctness,  validity,  and  consistency  with  other  data  in 
the  database  as  well  as  with  other  data  that  has  just  been  entered.  If  an 
error  is  detected,  the  user  should  be  required  to  correct  it  on  the  spot. 

7.  Easy  modification:  Besides  entering  new  information,  the  user 
should  be  able  to  conveniently  modify  or  remove  previously  entered 
information. 

8.  Detailed  operating  instructions:  At  all  times  the  user  should  be 
informed  of  what  is  expected  and  the  various  actions  that  he  can  perform. 


2.2.2  Current  Status 


We  have  produced  an  early  version  of  CADET  which,  for  the  most  part, 
satisfies  the  goals  described  above.  Subject  to  its  limitations,  CADET  is 
usable  in  a  variety  of  situations. 

CADET  is  one  of  the  activities  that  can  be  selected  from  the  main  menu 
of  ETC.  As  its  first  action  upon  being  selected,  CADET  asks  the  user  for 
the  name  of  the  circuit  to  be  entered  or  modified.  If  the  user  responds 
with  the  name  of  a  circuit  that  already  exists  in  the  database,  CADET 
produces  a  display  of  the  1441  card  for  that  circuit  and  affords  the  user 
the  opportunity  to  correct  any  entry  on  the  card.  If  the  circuit  does  not 
already  exist,  then  a  blank  card  is  displayed  and  CADET  begins  enforcing  a 
discipline  upon  the  user  of  entering  data  for  each  field  on  the  card  in 
turn  before  going  on  to  the  next  field.  At  any  time  in  the  data  entry 
process,  however,  the  user  is  free  to  interrupt  work  on  the  current  field 
to  go  back  and  make  a  change  or  correction  in  a  field  already  filled  in. 
Each  field  of  the  card  has  entry  and  error-checking  routines  associated 
with  it  in  a  table-driven  fashion.  These  routines  may  include  menus,  rules 
for  valid  data,  and  other  characteristics  of  the  field  in  question. 
Additionally,  each  field  has  an  indication  as  to  whether  it  must  be  filled 

in  or  may  be  left  blank.  In  the  former  case,  CADET  does  not  allow  the  user 

to  leave  the  field  blank  and  go  on  to  another  field. 

Heavy  use  is  made  of  the  Symbolics  graphics  and  interactive 
capabilities.  In  order  to  focus  the  user's  attention,  the  field  currently 
being  accessed  is  highlighted  in  reverse  video.  Each  field  is  a 
mouse-sensitive  region,  so  that  the  mouse  can  be  used  to  select  a  field 
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when  out-of-order  modifications  are  desired.  As  one  moves  the  mouse,  each 
field  passed  over  is  highlighted  by  means  of  a  thick  line  around  it, 

thereby  informing  the  user  as  to  which  field  would  become  the  current  one 

were  he  to  click  the  left  mouse  button.  When  all  available  space  in  a 
field  has  been  filled  by  typing,  any  attempt  to  type  more  into  it  results 

in  a  flash  of  the  screen  and  the  sounding  of  a  beep. 

A  window  is  used  by  CADET  to  provide  instructions  to  the  user.  The 
contents  of  this  window  depend  upon  the  field  being  accessed  and  whether  it 
is  empty  or  filled.  The  instructions  tell  the  user  how  to  fill  an  empty 
field,  how  to  empty  a  filled  field  (if  it  may  be  emptied),  and  how  to 
modify  a  field.  A  specific  message  tells  the  user  if  the  field  is  one  that 
must  be  filled.  The  aim  of  the  instructions  window  is  to  leave  no  doubts 
in  the  novice  user's  mind  as  to  what  he  may  do.  On  the  other  hand,  a 
knowledgeable  user  can  proceed  rapidly  with  the  data  entry  process  by 
simply  not  reading  the  instructions. 

At  any  time  a  user  may  cancel  the  entire  session  and  return  to  the 
main  menu.  When  the  user  has  passed  through  the  last  field  of  the  card  (or 
when  he  is  editing  an  existing  card)  he  is  the  given  the  option  of 
accepting  the  card  and  thereby  incorporating  it  into  the  database.  If  he 

does  accept  the  card,  then  a  new  database  entry  is  made  (in  the  case  of  a 
new  circuit)  or  a  previously-existing  entry  is  modified  to  reflect  the 
desired  changes. 

Demonstrations  of  the  current  version  of  CADET  have  been  very 
positively  received.  Some  viewers  have  even  felt  that  it  would  be  a  useful 
tool  as  is,  without  any  further  extensions. 


2.2.3  An  Example 


Figure  14  shows  a  dump  of  the  Symbolics  terminal  screen  during  a 
session  of  CADET.  We  now  describe  the  contents  of  the  screen. 

The  big  window  in  the  center  of  the  display  is  a  replica  of  the  front 
of  a  1441  card  that  has  been  filled  in  for  circuit  ”7DUM"  (see  "CCSD" 
fields  in  the  second  aid  bottom  rows).  In  the  top  row  on  the  right  there 
are  two  mcu<=  ?-sensitive  regions:  '‘CANCEL?”  for  cancelling  the  entire 
session,  ".id  "ACCEPT?"  for  accepting  the  session  and  incorporating  the 
material  into  the  database.  (For  protection,  mousing  either  of  these  two 
regions  results  in  a  request  for  confirmation  from  the  user  before  the 
action  is  carried  out.) 

Each  field  in  the  card-window  is  mouse-sensitive  and  may  be  selected 
as  desired  for  modifications.  In  particular,  "TYPE  CIRCUIT"  is  the 
currently-selected  field  and  is  indicated  as  such  by  being  displayed  in 
reverse  video.  By  looking  at  the  Instructions  window  (the  topmost  one)  we 
can  see  that  this  current  field  was  selected  for  modification. 

The  Instructions  window  tells  us  that  the  current  field  may  not  be 
left  blank  (unlike  some  other  fields  on  the  card,  which  have  been  left 
blank).  It  tells  us  the  old  contents  that  are  being  changed  and  how  we  may 
reinstate  them  without  changing  them.  Finally  it  tells  us  that  CADET  is 
waiting  for  the  user  to  choose  a  new  item  from  the  menu  that  is  displayed 
immediately  to  the  left  of  the  card-window. 

A  careful  examination  of  the  situation  allows  us  to  infer  what  has 
happened.  The  user  originally  inserted  into  the  TYPE  CIRCUIT  field  the 
contents  "FP  FD  NS".  He  has  since  realized  that  this  was  an  error  and  that 

48 


Figure  14.  CADET  display 


"NS"  should  be  "N2“.  He  therefore  moused  the  field,  reinserted  "FP  FD" , 
and  is  mousing  on  the  "N2"  in  the  menu  (note  the  box  around  "N2"  in  the 
menu)  in  anticipation  of  inserting  it  into  the  field.  Once  he  has 
completed  this  field,  he  may  go  on  to  another  field  for  modification,  or  he 
may  accept  the  card,  or  he  may  even  cancel  the  whole  session. 

2.2.4  Future  Work 

Much  work  needs  to  be  done  before  CADET  can  reasonably  be  regarded  as 
a  prototype  of  a  field-deployable  system.  We  expect  to  deal  with  many  of 
these  issues  before  the  field  demonstration  scheduled  for  February  1987  at 
Andrews  AFB. 

The  error  and  validity  checking  for  some  of  the  fields  is  fairly 
extensive,  but  for  the  others  it  is  minimal  or  non-existent.  We  need  to 
fill  these  gaps.  For  example,  some  fields  may  be  filled  only  from  a 
predefined  set  of  entries;  while  this  is  easily  handled  with 
mouse-sensitive  menus  for  fields  having  small  sets  of  permissible  entries, 
other  fields  may  have  hundreds  of  allowed  choices.  We  must  decide  how  best 
to  handle  such  big  sets,  from  the  points  of  view  of  checking  and  presenting 
to  the  user  the  permissible  responses. 

The  current  implementation  of  CADET  assumes  that  the  database  already 
contains  descriptions  of  all  trunks  that  will  carry  any  new  user  circuit 
being  entered.  We  plan  to  extend  CADET  to  handle  the  input  of  new  trunks. 
One  possibility  involves  a  push-down  mechanism,  allowing  the  user  to  enter 
a  trunk  in  the  middle  of  the  process  of  entering  a  user  circuit. 

Another  limitation  of  CADET  is  the  assumption  that  the  database 
already  contains  descriptions  of  all  the  devices  involved  in  the  circuit 


being  entered.  We  are  planning  to  provide  a  mechanism  for  entering  new 
devices  and  their  specifications  into  the  database.  Finally,  we  need 
mechanisms  for  making  permanent  modifications  to  the  database,  i.e.,  onto 
the  disk,  rather  than  the  current  limitation  of  merely  updating  the 
database  in  active  memory. 

3.  LESSONS  LEARNED 

The  Expert  Tech  Controller  project  has  encountered  many  obstacles  that 
are  typical  of  expert  system  development.  The  solutions  developed  for 
these  specific  problems  may  contribute  to  a  more  global  domain-independent 
understanding  of  viable  techniques  for  expert  system  design  and 
implementation. 

3.1.  Software  Environment 

Probably  the  most  publicized  controversy  in  expert  system  development 
is  the  debate  about  what  kinds  of  hardware  and  software  development 
environments  are  necessary  or  desirable.  This  section  will  discuss  the 
decisions  made  thus  far  in  this  project,  and  how  these  decisions  have 
impacted  the  work. 

Software  environments  for  most  Artificial  Intelligence  applications  in 
the  US  have  traditionally  been  based  in  LISP.  LISP  by  itself  is  a 
high-level  language  that  facilitates  symbolic  processing.  Development  of 
the  Expert  Tech  Controller  has  been  done  exclusively  with  LISP  and 
LISP-based  tools.  These  tools  include  the  Symbolics  system  software 
features  and  ART,  the  expert  system  shell  developed  by  Inference 
Corporation. 

The  advantages  of  the  Symbolics  software  environment  include  rapid 
prototyping  features;  incremental  LISP  compilation;  a  "smart"  LISP  editor; 


an  object  oriented  programming  facility  (Flavors);  and  extensive  graphics 
capabilities.  The  main  disadvantage  of  this  programming  environment  is 
that  it  is  machine-dependent,  with  the  machine  being  very  expensive.  Our 
opinion  of  the  situation  seems  to  coincide  with  the  general  consensus  in 
the  expert  system  development  world.  For  prototyping  and  development  the 
environment  is  extremely  valuable.  For  delivery  systems,  however,  the 
costs  of  Symbolics  hardware,  software,  and  maintenance  contracts  are  so 
high  that  one  would  like  to  port  any  final  software  product  to  a  more 
conventional,  less  expensive  environment.  The  problem  with  this  is  that 
many  features  do  not  port  gracefully.  Consideration  of  portability  during 
development,  by  avoiding  such  features,  may  inhibit  the  development 
process. 

The  major  software  tool  used  for  the  development  of  the  Expert  Tech 
Controller  has  been  ART.  We  found  the  ART  environment  to  be  quite  useful 
during  early  stages  of  system  development.  The  inference  mechanism  of  ART, 
like  those  of  other  typical  shells,  seems  to  be  quite  effective  for  small 
to  moderate-sized  systems.  As  a  system  grows,  a  pure  rule-based  system 
tends  to  push  the  capability  of  these  inference  mechanisms.  The  solution 
devised  for  the  Expert  Tech  Controller  has  been  to  encode  directly  in  LISP 
those  portions  of  the  system  that  are  procedural  rather  than  rule-based  in 
nature;  this  includes  (for  example)  many  of  the  individual  fault  isolation 
strategies.  When  invoked  as  appropriate  from  within  the  rule-based 
structure  that  remains  within  ART,  these  sections  execute  very  rapidly 
without  causing  the  inferencing  process  to  bog  down.  We  have  concluded 
that  this  approach  allows  convenient  access  to  the  best  features  of  both 


software  worlds,  namely  the  power  of  rule-based  processing  and  the 
efficiency  of  LISP  code.  The  problem  with  this  approach  is  determining 
which  subproblems  should  in  fact  be  rule-based.  While  this  identification 
process  may  take  considerable  experimentation,  nevertheless  we  are 
convinced  that  our  use  of  both  software  options  has  greatly  enhanced  system 
development . 

3.2.  Hardware  Environment 

Although  the  cost  of  specialized  LISP  processing  hardware  is  high,  the 
investment  is  worthwhile  for  system  prototyping  and  development  because  the 
fast  execution  of  LISP  greatly  enhances  programmer  productivity.  As  noted 
above,  however,  high  costs  may  exclude  specialized  LISP  machines  from 
consideration  as  delivery  vehicles.  A  possible  solution  is  becoming 
available  in  the  form  of  conventional  computer  work  stations,  whose 
manufacturers  are  beginning  to  attain  enough  processing  power  to  run  LISP 
at  an  acceptable  speed  for  delivery  purposes.  Also,  Inference  Corporation 
has  recently  begun  offering  the  capability  of  transforming  LISP-based  ART 
systems  into  the  C  language  for  delivery  on  one  of  the  numerous  machines 
that  support  C;  while  this  has  the  advantage  of  avoiding  ART-related 
problems,  it  does  not  resolve  the  probable  incompatibility  of  graphics  and 
I/O  systems. 

3.3.  Database  Considerations 

The  combination  of  specialized  symbolic  processing  hardware,  the  ART 
shell  and  a  large  amount  of  data  presents  a  complex  database  problem. 
Initially  we  implemented  the  database  in  Schemata,  the  frame-based 
knowledge  representation  mechanism  provided  by  ART.  As  the  system  expanded 


it  became  obvious  that  the  Schemata  system  involved  unacceptable  overhead 
and  would  be  unable  to  handle  the  Expert  Tech  Controller  database  as  it 
grew  toward  the  required  size.  At  this  point  we  made  a  decision  to  retain 
the  frame-based  structure  and  increase  efficiency  by  converting  the 
database  to  Flavors,  the  Symbolics  object-oriented  programming  facility. 
This  improved  matters,  although  it  introduced  a  new  problem  in  that  the 
inferencing  mechanism  ART  cannot  directly  access  information  in  the 
Flavors-based  database.  Our  solution  to  this  problem  has  been  to  select 
the  database  information  that  is  relevant  to  a  problem-solving  session  and 
convert  it  from  Flavors  to  Schemata  in  advance. 

Recently  we  have  begun  addressing  the  issues  affecting  the  realization 
of  a  more  permanent  database.  The  only  reasonable  way  to  deal  with  the 
quantity  of  data  needed  for  problem  domains  like  the  Expert  Tech  Controller 
is  to  store  it  on  disk.  This  issue  will  require  considerable  work,  and 
will  be  discussed  in  the  future. 

3.4.  Knowledge  Engineering 

We  have  found  that  there  are  a  number  of  keys  to  effective  knowledge 
engineering.  First,  there  is  no  substitute  for  a  bona  fide  expert  with 
extensive  experience  in  actually  solving  in  actually  solving  the  problems 
in  question.  We  initially  used  manufacturers'  manuals  for  some  equipment 
items  as  supplementary  sources  of  information  that  could  be  accessed  at 
home,  at  our  leisure;  however,  it  turned  out  that  the  more  formal 
diagnostic  procedures  in  the  manuals  were  unnecessarily  detailed  and 
inefficient,  compared  with  the  shortcuts  and  rules  of  thumb  typically  used 


by  the  human  experts 


It  is  also  desirable  to  have  more  than  one  expert.  This  has  the 
advantage  of  preventing  the  system  from  becoming  distorted  by  one  person's 
idiosyncrasies,  and  the  disadvantage  of  forcing  the  knowledge  engineer  to 
deal  with  conflicting  expert  opinions.  A  major  goal  of  the  knowledge 
engineer  must  be  to  merge  these  diverse  opinions  in  a  way  that  keads  to  the 
most  consistent  and  effective  procedures. 

We  have  found  that  one  good  way  to  motivate  the  domain  experts  is  to 
quickly  and  accurately  implement  their  suggested  changes  before  the  next 
knowledge  engineering  session.  Perceived  problems  in  the  system's 
reasoning  will  not  bother  the  experts  so  much,  once  they  realizes  how 
quickly  they  can  get  changes  made. 

Although  distance  has  limited  the  frequency  of  our  interactions  with 
the  domain  experts,  our  experience  indicates  that  shorter,  more  frequent 
knowledge  engineering  sessions  may  be  more  effective  than  longer,  less 
frequent  ones.  This  provides  for  quick  feedback  on  a  smaller  number  of 
changes,  and  leads  to  more  efficient  use  of  the  expert's  time. 

3.5.  The  liser  Interface 

When  dealing  with  end  users  and  domain  experts  having  little  or  no 
computer  experience,  the  user  interface  is  a  critical  consideration.  There 
is  high  potential  for  a  communication  bottleneck.  The  issue  is  complicated 
in  many  cases,  as  in  the  Expert  Tech  Controller,  when  detailed  graphics  are 
essential  for  accurately  communicating  about  the  problem  domain.  We  have 
found  that  the  overused  term  "user  friendly",  and  effective  transfer  of 
information  content,  should  represent  the  major  goals  of  the  user 
interface.  High  resolution  bit-mapped  graphics  and  extensive  use  of  the 
mouse  can  aid  in  achieving  both  of  these  goals. 


Another  user  interface  issue  we  are  addressing  is  that  of  minimizing 
the  demands  upon  the  operator.  In  the  Expert  Tech  Controller  domain,  many 
requested  inputs  are  simply  readings  of  meters,  device  alarm  panels,  and 
other  indicators.  The  process  of  making  these  observations  and  inputting 
the  results  can  be  time-consuming  and  dull  to  the  user.  Automating  some  or 
all  of  these  inputs  can  make  more  efficient  use  of  time  for  both  the  expert 
system  and  the  user. 

3.6.  Knowledge  Representation 

The  choice  of  knowledge  representation  is  probably  the  most  critical 
issue  in  expert  system  design  and  implementation.  It  is  also  the  major 
bottleneck  in  enabling  inexperienced  knowledge  engineers,  or  domain  experts 
themselves,  to  build  expert  systems.  Although  the  frame-based  mechanisms 
offered  by  state-of-the-art  expert  system  shells  are  very  powerful,  most 
domains  will  still  require  considerable  customization  to  accurately  portray 
domain  information. 

The  knowledge  representation  technique  used  in  the  Expert  Tech 
Controller  could  best  be  characterized  as  a  modified  frame-based 
structure.  The  man-hours  expended  in  the  modification  process,  however, 
must  not  be  underestimated.  It  is  a  slow  process,  and  it  appears  to  take 
place  throughout  system  development.  It  should  be  noted  that  Symbolics 
LISP  provides  an  excellent  environment  for  making  these  modifications. 

3.7.  Data  Acquisition 

Data  acquisition  can  be  a  major  hurdle  when  trying  to  get  an  expert 
system  from  the  prototype  stage  to  the  delivery  stage.  In  the  Expert  Tech 
Controller,  the  feasibility  of  fault  diagnosis  can  currently  be 


demonstrated  on  a  few  tens  of  circuits.  To  be  useful  in  the  field, 
however,  ETC  must  contain  data  on  several  hundred  circuits. 

At  present  we  are  building  a  data  acquisition  system  that  will  allow 
the  tech  controllers  to  conveniently  enter  data  for  large  numbers  of 
additional  circuits.  This  will  greatly  decrease  the  amount  of  time 
currently  spent  on  this  process  by  the  knowledge  engineers.  It  will  also 
provide  a  mechanism  for  circuit  addition  and  modification  after  the  expert 
system  is  fielded.  The  creation  of  this  data  acquisition  mechanism  is  a 
challenging  problem  because  it  must  be  simple  to  use,  yet  has  to  convert 
the  information  obtained  into  the  complex  data  structures  expected  by  fault 
diagnosis  and  other  modules. 

3.8.  Knowledge  Acquisition 

Currently,  the  only  method  for  adding  knowledge  to  the  Expert  Tech 
Controller  is  for  the  knowledge  engineer  to  add  LISP  code  or  ART  rules. 
Automation  of  the  knowledge  acquisition  process  is  a  topic  being  addressed 
by  many  researchers.  To  the  extent  that  we  do  address  this  topic  in  the 
future,  our  emphasis  is  likely  to  be  placed  on:  modularizing  the  system  to 
enhance  automatic  production  of  modules;  creation  of  a  method  to  go  from 
the  specifications  for  a  new  device  to  diagnostic  rules  and  procedures  for 
the  device;  use  of  inherited  diagnostics  for  device  "types";  and 
development  of  modules  that  automatically  build  the  procedures  to  enhance 
graphics,  in  terms  of  both  circuit  displays  and  alarm  panel  displays. 

4.  TESTBED  ARCHITECTURE  STUDY 

It  appears  that  Machine  Intelligence  techniques  offer  possibilities 
for  significant  improvements  at  many  levels  of  System  Control  in  the 


Defense  Communications  System  (DCS).  The  Expert  Tech  Controller  project 
described  above,  which  addresses  the  foundation  layer  of  System  Control, 
was  chosen  for  the  initial  implementation  because  it  involves  a  set  of 
clearly-def inable  problems  and  existing  centers  of  expertise  for  solving 
them,  and  promises  to  yield  interesting  results  in  the  near  term.  As  a 
complement  to  this  project,  we  have  undertaken  a  study  in  FY86  of  the 
applications  of  Machine  Intelligence  techniques  at  other  levels  of  the 
System  Control  structure.  This  section  of  the  report  describes  the  results 
of  the  study,  specifically  including  recommendations  for  a  simulation-based 
testbed  architecture  for  evaluating  Sytem  Control  techniques.  The  study 
results  are  presented  in  the  broader  sense  as  issues  of  importance  for  the 
Government  to  consider  in  planning  future  programs,  rather  than  specific 
proposals  for  new  work. 

We  first  examine  the  projected  direction  of  advances  in  DCS 
communications  technology  and  organization,  and  then  identify  problem  areas 
in  which  machine  intelligence  would  be  of  benefit.  This  provides  a  basis 
for  determining,  in  a  general  way,  the  expected  functionality  of  future 
control  systems  which  incorporate  machine  intelligence.  Following  from 
this  description  is  an  outline  of  the  form  these  systems  might  take,  and  a 
discussion  of  the  research  problems  which  must  be  addressed  in  order  to 
build  these  systems.  At  the  conclusion  we  have  recommendations  for 
near-term  research  objectives,  work  to  follow  after  these  objectives  have 
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been  met,  and  some  long-term  goals 


4.1  The  Future  DCS 


The  DCS  is  currently  evolving  in  terns  of  both  communications 
technology  and  organizational  structure.  As  more  and  more  digital 
communications  equipment  has  been  introduced  into  the  network,  the 
possibilities  for  automated  data  collection  and  control  have  increased 
greatly.  The  complexity  of  system  control  problems  has  also  grown  as  a 
result  of  these  changes.  Although  digital  equipment  generally  offers  the 
advantage  of  robust  operation,  requiring  little  day-to-day  maintenance,  the 
additional  functionality  provided  often  means  that  human  operators  must 
have  a  sophisticated  level  of  knowledge  to  understand  the  optimum  ways  to 
test,  diagnose,  and  reconfigure  the  equipment.  The  increased  reliability 
of  digital  equipment  has  also  reduced  the  opportunities  for  human  operators 
to  gain  real-time  practical  experience  in  network  control. 

The  organization  of  the  DCS  control  structure  is  also  changing  with 
the  introduction  of  the  concept  of  subregion  control  facilities.  This 
change  will  place  a  greater  emphasis  on  distributed  control  of  the  circuit 
switched  network,  and  integration  of  monitoring  and  control  across  the 
various  DCS  subsystem  boundaries.  In  the  past,  control  of  the  transmission 
system  and  each  of  the  various  networks  which  utilize  the  system  has 
operated  in  a  rather  independent  manner.  As  the  future  DCS  evolves,  a 
system  controller  will  be  expected  to  integrate  status  data  across 
subsystems  and  networks,  and  make  decisions  regarding  allocation  of 
resources  which  may  have  wide-ranging  implications.  It  will  become 
necessary  for  a  controller  to  know  the  answers  to  "What  if  ...  ?"  questions 


for  a  much  wider  range  of  situations  than  is  currently  expected. 


In  summary,  the  DCS  is  becoming  more  reliable  and  more  automated  in  a 
way  which  increases  the  automation  of  data  collection  and  control 
execution,  but  may  place  a  greater  burden  on  system  control.  Effective 
control  requires  carefully  considered  decision  making  —  a  task  which  is 
becoming  more  difficult  as  a  result  of  increased  complexity.  At  the  same 
time,  system  control  operators  are  developing  less  experience  with 
problems,  especially  those  of  the  magnitude  and  scope  which  might  arise  in 
a  real  crisis.  It  is  in  this  area  of  providing  automated  assistance  for 
decision  making,  particularly  in  stress  conditions,  that  the  application  of 
machine  intelligence  techniques  has  the  greatest  potential  benefit  for 
system  control. 

4.2  Problem  Areas  for  Machine  Intelligence  Applications 

There  is  a  wide  range  of  problem  areas  which  might  be  addressed  in  an 
effort  to  provide  automated  assistance  for  decision  making.  Within  the  DCS 
control  structure  at  the  upper  levels  (e.g.,  ACOC  or  DCAOC),  decision 
making  requires  reasoning  about  what  data  is  important  and  must  be 
considered,  and  what  data  is  not  directly  relevant  to  the  problem  at  hand. 
Further,  a  controller  must  be  able  to  determine  quickly  what,  if  any, 
additional  data  is  needed  and  how  to  find  it.  Thus,  rather  than  present  an 
operator  with  an  enormous  collection  of  facts  gathered  from  several 
networks  and  subsystems  around  the  world,  it  would  be  much  more  effective 
to  interpret  this  data,  using  knowledge  about  these  networks,  and  the  role 
of  this  operator  in  controlling  them.  Once  a  particular  crisis  situation 
has  been  recognized,  the  controller  often  has  to  select  from  a  large  set  of 
alternatives  in  responding  to  the  crisis.  An  intelligent  aide  would  assist 
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in  this  task  by  generating  appropriate  plans  and  recommending  the  best 
choices.  Such  a  system  would  embody  aspects  of  sensor  data  fusion, 
situation  assessment,  intelligent  data  base  query,  interactive  task 
recognition,  and  planning. 

There  are  several  steps  with  which  this  problem  might  be  approached. 
Initially,  one  or  more  independent  automated  decision  aids  might  be 
developed.  These  would  represent  low-risk,  well  defined  tasks.  When 
completed,  each  aid  would  be  a  useful  product,  albeit  in  prototype  form. 
More  significantly,  the  work  involved  in  designing,  implementing,  testing 
and  evaluating  each  aid  would  be  beneficial  in  building  a  base  of  knowledge 
and  experience  with  the  details  of  the  problem  domain.  As  this  technology 
matures,  the  more  difficult  research  problems  should  be  addressed.  We  may 
envision  an  evolution  of  these  independent  aids  toward  an  integrated  system 
of  cooperative,  autonomous  agents.  The  role  of  these  problem-solving 
systems  would  shift  from  being  an  automated  tool  for  the  human  controller 
to  becoming  a  member  of  the  system  management  and  control  team.  Although 
such  systems  are  clearly  beyond  today’s  technology,  they  serve  as  useful 
goals  in  understanding  the  direction  for  current  research. 

We  now  focus  on  a  specific  problem,  namely  network  control  for  the 
Defense  Switched  Network  (DSN).  This  network  is  of  interest  because  it 
represents  the  Introduction  of  a  new  system  control  problem.  Unlike 
existing  networks  with  which  there  is  a  large  base  of  experience  in  network 
management,  the  DSN  control  must  be  developed  from  the  ground  up.  Although 
there  are  many  man-years  of  experience  with  ci rcuit -swi t ched  voice 
networks,  such  as  Al'TOVON,  the  DSN  differs  significantly  from  a  control 
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perspective.  First,  the  DSN  (as  configured  for  the  European  theater)  will 
utilize  many  more,  but  smaller,  switches  than  AUTOVON.  While  this  tends  to 
increase  control  flexibility,  it  also  means  the  network  will  behave 
differently.  The  choice  of  control  action  for  any  given  situation  in  the 
DSN  will  not  necessarily  be  the  same  as  has  been  used  for  AUTOVON.  Second, 
because  of  the  larger  number  of  switches,  it  is  unlikely  that  a  single, 
unaided  controller  will  be  able  to  maintain  the  level  of  cognizance  over 
the  entire  network  (just  in  Europe  alone)  necessary  for  optimum 
management.  Third,  as  a  result  of  the  changes  in  DCS  organizational 
structure  mentioned  earlier,  the  control  of  DSN  will  be  distributed,  in 
part,  to  the  subregion  control  level,  thus  making  control  less  centralized 
than  it  is  with  AUTOVON. 

4 . 3  Architectures  for  Machine  Intelligence  Systems 

At  the  core  of  typical  machine  intelligence  systems  are  knowledge 
about  the  physical  world  of  interest  and  a  reasoning  capability  (the 
inference  engine).  The  design  issues  are  associated  with  acquiring, 
formalizing  and  representing  this  knowledge,  and  determining  efficient 
control  strategies  for  using  the  knowledge  to  reason  about  the  problems  to 
be  solved.  We  illustrate  an  approach  to  this  design  process  with  an 
example  based  on  the  DSN  control  problem  mentioned  previously. 

The  key  to  designing  an  effective,  knowledge-based  DSN  controller  is  a 
complete  and  detailed  understanding  of  the  knowledge  needed  to  interpret 
the  available  data,  to  assess  the  current  status  of  the  network,  and  to 
recommend  appropriate  control  actions.  This  knowledge  takes  a  variety  of 
forms,  but  may  be  divided  into  two  broad  categories:  empirical  and 
theoretical . 
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Empirical  knowledge  is  derived  from  observations  and  experience.  For 
example,  past  experience  might  tell  us  that  during  peak  traffic  periods  the 
failure  of  one  specific  switch  is  likely  to  result  in  traffic  overload  at 
three  others.  Further,  we  may  have  observed  that  a  particular  control 
action,  such  as  changing  routing  tables  to  distribute  this  load  over  a 
larger  set  of  intermediate  switches,  would  reduce  the  overload  to 
manageable  proportions. 

Theoretical  knowledge  is  based  on  the  underlying  physical  and 
mathematical  models  we  have  developed  for  describing  the  behavior  of  real 
networks.  We  know  from  fundamental  theory,  for  example,  that  the  loss  of 
all  trunks  connecting  one  part  of  the  network  to  the  rest  will  result  in 
isolation  of  that  part,  and  the  failure  of  all  call  attempts  between  the 
isolated  parts.  This  knowledge  suggests  a  control  action  blocking  all  such 
call  attempts  at  their  source  so  as  not  to  overload  the  network  with 
attempts  doomed  to  failure. 

While  these  are  admittedly  oversimplified  examples,  they  are  intended 
only  to  illustrate  some  of  the  differences  between  empirical  and 
theoretical  knowledge.  We  observe  that  empirical  knowledge  often  involves 
approximations  and  making  judgments.  Using  empirical  knowledge  in  these 
situations  means  that  we  must  reason  with  inexact  information  or  in  the 
presence  of  uncertainty.  For  highly  complex  systems  such  as  the  DSN,  it 
should  be  cle?  _•  that  although  there  may  be  great  volumes  of  theoretical 
knowledge  about  each  of  the  network  components,  our  knowledge  about  the 
dynamics  of  network  behavior  must  come  largely  from  empirical  evidence. 

The  system  is  much  too  complex  to  be  described  by  detailed  mathematical 


models.  Thus  for  "expert-level  performance",  theory-based  knowledge  alone 
is  not  sufficient,  but  must  be  coupled  with  knowledge  derived  from 
experience  and  human  insight. 

Another  perspective  from  which  knowledge  may  be  viewed  is  based  on 
what  the  knowledge  describes,  rather  than  on  how  it  was  acquired.  This  is 
a  significant  distinguishing  characteristic  because  it  often  influences  the 
choice  of  knowledge  representation.  In  the  case  of  the  DSN  we  need 
knowledge  about  the  structure  and  form  of  the  network;  knowledge  about  the 
function  of  various  network  components;  knowledge  which  describes  expected 
network  behavior  under  various  traffic  conditions;  and  knowledge  of 
alternative  control  actions,  including  conditions  under  which  actions 
should  or  should  not  be  invoked.  It  is  unlikely  that  a  single  form  of 
knowledge  representation  would  suffice  for  all  of  these  categories. 

Finding  efficient  control  strategies  for  reasoning  with  this  knowledge 
is  the  second  major  design  issue.  The  number  of  alternative  choices  in 
solving  network  control  problems  is  so  large  that  simple  exhaustive 
searching,  especially  under  the  demands  of  near  real-time  decision  making, 
is  not  feasible.  In  addition,  the  issue  of  uncertain  or  inexact 
conclusions  complicates  even  the  simplest  approaches.  One  approach  is  to 
use  data  from  various  sources  to  confirm  hypotheses.  This  often  provides  a 
mechanism  for  effectively  using  inexact  or  uncertain  data.  The  problem  of 
large  search  spaces  may  be  addressed  by  using  human  insight  about  network 
behavior  to  develop  heuristics  for  guiding  the  search. 

Each  of  these  techniques  adds  complexity  to  the  overall  system.  If 
not  carefully  managed,  this  complexity  may  easily  overcome  the  designers' 


ability  to  complete  a  successful  operational  system.  A  modular  system  may 
be  built  by  using  multiple,  specialized  problem  solving  agents  which  reason 
cooperatively  to  solve  the  problem.  In  the  design  of  a  machine  intelligent 
controller  for  DSN  we  see  a  probable  need  for  four  such  agents:  an 
assessment  agent,  a  planner,  a  routing  strategist,  and  a  controls 
strategist.  The  assessment  function  attempts  to  interpret  data  from  DSN 
switches  and  other  "external"  data,  such  as  transmission  equipment  status, 
so  as  to  form  conclusions  about  the  current  state  of  the  network.  The 
planner  uses  goals  for  desired  network  behavior  and  the  conclusions  from 
assessment  to  generate  plans  which  guide  the  overall  response  of  the 
control  system  to  network  problems.  The  routing  strategist  and  the 
controls  strategist  represent  specialized  knowledge  sources  which 
incorporate  both  empirical  and  theoretical  knowledge  needed  to  answer 
questions  such  as  "What  is  likely  to  happen  if  code  blocking  is  introduced 
as  a  control  action?",  or  "Which  routing  and  preemption  procedures  are  most 
likely  to  allow  the  greatest  number  of  higher  precedence  calls  to  be 
completed  under  current  network  conditions?" 

4 .4  Research  Issues 

It  should  now  be  clear  that  there  are  a  number  of  interesting  research 
questions  to  be  answered  in  developing  architectures  for  future 
applications.  For  the  specific  problem  of  DSN  control,  we  are  forced  to 
ask  how  we  can  acquire  the  necessary  empirical  knowledge.  Not  only  do  we 
not  have  a  base  of  experience  with  DSN  control,  the  network  is  not  yet 
complete.  To  wait  for  a  completed  network  and  the  time  necessary  for 
humans  to  become  proficient  seems  unreasonable.  An  intelligent  network 


control  aid  would  perhaps  be  most  useful  when  human  operators  are  least 
experienced.  One  approach  is  to  develop  the  necessary  insight  and 
empirical  knowledge  in  one  or  two  highly  skilled  individuals  be  repeated 
use  of  simulations.  These  persons  would  then  become  the  "experts”  from 
whom  the  knowledge  could  be  acquired.  The  next  logical  step  is  to 
investigate  how  closely  these  two  processes  may  be  linked;  are  there 
techniques  for  automating  much  of  this  learning  and  transfer  of  knowledge? 

From  a  broader  perspective,  we  are  concerned  with  how  new  applications 
are  tested  and  evaluated.  This  is  an  important  area  for  systems  which  we 
want  eventually  to  be  placed  in  the  hands  of  operators  having  weak  training 
and  essentially  no  understanding  of  how  these  systems  work.  We  know  from 
past  experience  that  machine  intelligence  applications  are  often  very 
fragile.  As  the  problem  size  grows  to  realistic  proportions,  the 
complexity  and  scope  of  problem  solving  demands  may  exceed  the  capabilities 
in  ways  which  lead  to  total  failure.  For  systems  in  a  real  world 
environment,  this  is  not  acceptable.  We  need  tools  and  facilities  to 
conduct  tests  which  push  the  problem  solving  demands  to  the  limits  of  these 
systems.  It  is  vital  for  continuing  progress  to  know  where  and  why  systems 
fail.  As  more  and  more  new  applications  are  developed  —  first  as 
prototypes,  then  followed  by  optional  field  versions  —  we  need  ways  to 
evaluate  system  performance  under  realistic  conditions  without  disrupting 
the  real  world. 

A, 5  Recommendations 

For  the  near  term  we  propose  that  the  existing  call-by-call  simulator 
developed  by  Lincoln  Laboratory  be  investigated  for  use  as  a  tool  in 


developing  the  empirical  knowledge  needed  for  DSN  control.  A  preliminary 
examination  has  indicated  that  some  enhancements  would  be  necessary;  for 
example,  additional  network  control  actions  are  required.  This  is  a  fairly 
short-term,  low-risk  investment,  since  the  bulk  of  the  simulator 
development  has  already  been  done. 

Objectives  for  follow-on  work  would  center  around  the  process  of 
integrating  the  simulation  environment  with  a  knowledge-based  system.  The 
first  step  might  be  done  in  a  totally  manual  fashion,  with  a  single 
individual  exercising  the  simulator  to  gain  insight  about  network  behavior 
and  then  using  an  expert  system  building  tool  to  construct  a  prototype 
system.  This  would  provide  an  initial  framework  in  which  one  could  then 
attempt  to  link  simulation  and  expert  system  together.  A  goal  of  this 
effort  would  be  to  develop  a  human-assisted  machine  learning  environment  in 
which  knowledge-based  systems  could  be  prototyped  by  interacting,  under 
human  guidance,  with  simulations  of  the  problem  domain. 

Long-term  goals  should  include  the  development  of  communications 
network  simulation  tools  and  techniques  for  effectively  integrating  these 
tools  with  knowledge  based  systems.  There  may  be  several  existing 
simulation  tools  which  have  been  developed  by  other  contractors  for 
previous  studies,  and  there  will,  no  doubt,  be  more  in  the  future.  As  the 
machine  intelligence  technology  continues  to  develop,  additional 
knowledge-based  systems  are  likely  to  be  produced.  What  will  be  needed  is 
a  common  facility  to  provide  a  testbed  for  evaluating  these  systems  in 
realistic  environments. 
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FY87  PLANS 


5.1  Demonstrations 

The  major  focus  of  our  work  in  early  FY87  will  be  preparation  for  a 
series  of  demonstrations  to  be  carried  out  at  Andrews  AFB  in  the 
February/March  1987  time  frame.  For  these  demonstrations  we  are  shipping  a 
Symbolics  workstation  to  Andrews  with  the  expectation  of  having  it  up  and 
running  there  by  the  end  of  December  1986.  Our  plan  is  to  start  working 
with  personnel  at  Andrews  so  that  by  the  time  of  the  demonstrations  they 
can  be  proficient  enough  in  the  use  of  ETC  to  participate  in  a  major  way. 
Ideally,  all  terminal  interactions  would  be  carried  out  by  AF  personnel 
with  Lincoln  Laboratory  involvement  limited  to  explanation  and  discussion. 

In  order  to  carry  out  a  representative  demonstration  of  ETC  working  at 
a  useful  level  we  need  to  extend  the  diagnostic  and  data  entry  capabilities 
beyond  those  that  we  showed  in  the  year-end  review  in  September.  In 
particular,  we  need  to  continue  the  fault  isolation  process  beyond  the 
point  at  which  we  have  effected  service  restoration,  in  order  to  pinpoint 
the  failed  component.  In  many  situations  this  post-restoration  fault 
isolation  involves  the  parallel  connection  of  spare  equipment  and 
comparison  tests  between  the  behavior  of  the  spare  and  the  unit  in 
question.  Much  of  the  software  need  for  post-restoration  isolation  existed 
at  the  time  of  the  annual  review  but  was  not  shown  because  the  demonstrated 
procedure  stopped  when  restoration  was  achieved. 

Another  diagnostic  capability  that  is  needed  for  the  Andrews 
demonstrations  is  an  ability  to  handle  digital  trunk  problems.  In  the 
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annual  review  we  were  limited  to  faults  on  individual  lines  or  channels  in 
a  trunk  circuit.  If  more  than  one  circuit  in  a  trunk  is  experiencing 
problems,  the  entire  trunk  is  suspect,  and  a  different  diagnostic  procedure 
is  indicated.  A  trunk  problem  can  evolve  from  a  problem  with  an  individual 
circuit  (channel),  or  it  can  appear  as  a  trunk  circuit  complaint  from  an 
adjacent  TCF.  The  same  diagnostic  procedure  applies  in  both  cases.  We 
need  new  displays  to  represent  the  trunk  problem  in  a  suitable  form  so  that 
the  controller  can  see  all  of  the  test  points  for  the  trunk  multiplexer. 

We  believe  that  the  above-mentioned  extensions  to  the  diagnostic 
capabilities  can  be  ready  in  good  time  for  the  demonstrations,  and  moreover 
that  we  will  be  able  to  deal  with  additional  complaints  such  as  receiving 
garble  on  a  teletype  circuit  and  excessive  transmission  problems  on  a 
computer-to-coraputer  modem  circuit.  Ideally,  we  would  also  be  able  to  deal 
with  another  class  of  problems  caused  by  broken  wires  and  dirty  jacks  in 
the  TCF.  We  are  not,  however,  confident  that  work  in  the  latter  area  will 
be  ready  to  show  at  the  time  of  the  formal  demonstrations. 

In  the  area  of  report  generation  we  expect  to  have  carried  the  Form 
1443  "Trouble  and  Restoration  Record"  to  a  point  where  it  correctly 
represents  the  state  of  the  diagnosis/restoration  procedure  that  has  been 
reached  by  ETC.  In  the  real  TCF  world  the  1443  report  is  not  completed 
until  the  circuit  is  returned  to  normal  operation  after  repairs  have  been 
finished  and  the  circuit  has  been  fully  checked  out  again.  We  do  not  plan 
to  include  this  final  phase  of  outage  processing  in  the  demonstrations. 

In  the  area  of  data  entry,  we  need  to  extend  the  CADET  program 
described  in  Section  2  so  that  it  can  handle  trunk  circuits  and  new 


instances  of  devices  already  known  in  generic  terms.  Further,  CADET  must 
be  extended  to  cause  a  newly  entered  circuit  to  become  a  permanent  part  of 
the  data  base  so  that  we  can  expand  the  database  to  an  interesting  size  by 
the  time  of  the  demos. 

In  order  to  show  the  potential  for  direct  connection  between  ETC  and 
the  communication  equipment  that  will  be  available  in  future  more-automated 
technical  control  facilities,  we  plan  to  connect  an  AN/FCC-100  time- 
division  multiplexer  to  ETC  using  RS-232  ports  on  the  multiplexer  and  the 
Symbolics  workstation.  The  connection  will  be  made  to  a  spare  FCC-100  at 
Andrews,  and  we  will  be  limited  to  showing  that  status  and  alarm 
information  can  be  sensed  by  ETC  and  that  the  ports  on  the  multiplexer  can 
be  configured  for  a  particular  use.  The  latter  step  is  needed  when  a  spare 
unit  is  to  be  placed  in  service.  Unfortunately,  the  FCC-100  does  not  allow 
all  of  its  capabilities  to  be  commanded  through  the  RS-232  port.  As  a 
result,  we  will  not  be  able  to  invoke  the  built-in  test  capabilities  of  the 
device.  These  can  only  be  accessed  manually  from  the  front  panel 
controls.  We  expect  that  this  connection  can  be  demonstrated  in  February 
as  an  independent  feature.  At  that  time,  this  new  feature  will  not  be 
integrated  into  the  circuit  diagnosis  procedure  because  the  spare  FCC-100 
is  not  part  of  any  circuit  at  Andrews.  At  a  later  stage  we  hope  to  be  able 
to  make  a  connection  to  another  FCC-100  and  demonstrate  an  ability  to  sense 
a  failure  condition  and  go  directly  to  a  fault  isolation/restoration 
procedure . 
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5.2  Plans  for  Extending  ETC 

There  are  many  different  directions  in  which  ETC  must  be  extended  in 
order  to  approach  the  breadth  of  capability  that  a  well-trained  tech 
controller  would  have.  In  the  preceding  section  we  noted  some  directions 
in  which  we  expect  to  extend  capabilities  prior  to  the  February 
demonstrations.  In  this  section  we  discuss  other  directions  in  which  work 
is  needed.  The  total  amount  of  work  is  more  than  can  be  carried  out  in  the 
upcoming  year,  and  we  are  not  now  in  position  to  lay  out  a  detailed  plan 
since  we  need  to  get  experience  with  FTC  in  the  Andrews  environment  to 
determine  whether  or  not  to  work  toward  greater  depth  or  breadth  of 
coverage.  In  the  end  we  seek  both  depth  and  breadth,  but  there  may  well  be 
greater  interir  utility  by  pursuing  one  at  the  expense  of  the  other.  For 
example,  we  now  have  a  capability  to  deal  with  simple  problems  in  data 
circuits,  but  we  have  no  capability  at  all  to  deal  with  voice  circuit 
problems.  We  could  leave  the  data  capability  at  its  present  level  and  work 
on  building  the  voice  capability  to  a  comparable  level  of  sophistication. 
Alternatively,  we  could  continue  to  develop  the  data  capability  while 
allowing  sophistication  in  the  voice  area  to  lag  that  in  the  data  area. 

The  latter  approach  could  make  the  system  more  useful  to  people  at  Andrews 
during  the  development  period,  particularly  if  trainees  there  were  having 
more  trouble  handling  data  circuit  problems.  We  expect  to  work  out  this 
bread  th/depth  tradeoff  in  consultation  with  the  expert  tech  controllers  at 
Andrews  once  they  have  had  a  suitable  opportunity  to  assess  the 


capabilities  already  in  place 


In  the  data  circuit  area  we  have  still  to  add  capabilities  to  handle 
some  simple  devices  that  shift  signal  levels  and  standardize  timing  for 
teletype  signals.  We  also  have  a  large  piece  of  work  to  accomplish  in 
handling  fault  diagnosis  on  multipoint  teletype  circuits.  (The  diagnosis 
itself  should  be  relatively  straightforward  since  the  multipoint  circuit 
can  be  thought  of  as  a  collection  of  simple  circuit  segments,  but 
significant  changes  are  needed  to  deal  with  the  more  complex  graphical 
representation  of  the  circuits.)  Still  more  work  will  be  needed  to  handle 
systems  that  combine  VFCT  channels  with  modems  to  accommodate  circuits 
needing  a  higher  data  rate  than  the  75  bits  per  second  offered  by  a  normal 
VFCT  channel.  There  may  well  be  other  special  data  circuit  configurations 
of  which  we  are  not  yet  aware. 

Further  work  is  needed  in  the  database  area  to  remember  patches  and 
equipment  and  line  outages  across  time  so  that  diagnoses  of  new  problems 
can  take  account  of  patches  already  made,  spares  already  used,  etc.  New 
procedures  are  needed  to  deal  witti  the  checkout  of  repaired  equipment  and 
restored  circuits  so  that  the  outage  reports  can  be  finished  off  and  the 
database  changed  to  reflect  the  return  to  normal  status. 
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